What Is AIOps?
AIOps is the use of artificial intelligence (AI) to manage, optimize and secure complex IT systems more effectively and flexibly by automating monitoring, diagnostics and remediation. These platforms increasingly combine infrastructure, application and network monitoring. Using correlation and in some cases machine learning to identify and remediate problems, they trigger automatic responses to current or potential issues and, in some cases, suggest remedies. Many vendors have added AI capabilities to existing application performance or system management tools, or vice versa, and offer AIOps as a service.
The AIOps market was estimated at between $900 million and $1.5 billion in 2020 with a projected compound annual growth rate of about 15% between 2020 and 2025, according to the Gartner Inc. Market Guide for AIOps Platforms.
How Does AIOps Work?
An AIOps platform ingests, indexes and normalizes data on thousands of events per second from components across the IT infrastructure, ranging from PCs and sensors on the Internet of Things to servers, network routers and firewalls. Advanced data analytics and machine learning tools then correlate and examine the data, often comparing them with baselines of “normal” activity and to patterns that preceded previous outages, slowdowns or cyberattacks. The tools alert system administrators, security operations or business users to potential problems. Some vendors also use AI and ML to screen “noisy” (irrelevant or incorrect data) to ensure higher-quality analysis.In addition to alerts, many AIOps platforms can trigger automated responses to issues or recommend manual remediation steps, again based on AI analysis of previous responses.
Omdia Chief Analyst Roy Illsley has identified 10 key characteristics of AIOps tools, while noting that not every product that is advertised as being an AIOps tool addresses all these functions.
1. Performance monitoring
This involves monitoring all the layers of the IT ecosystem, including server, storage and network, as well as all relevant metrics including customer experience and application performance.
AIOps systems should be easy to use and intuitive, giving users only the information they need to see to optimize IT operations.
3. Data management
AIOps tools must correlate data from a wide variety of sources and identify new insights from it. While very few solutions will currently work with all data sources, we expect these capabilities to grow over time.
4. Security operations
AIOps systems should examine events, incidents and resource usage to identify anomalies that may signal an attack, such as a significant increase in disk I/O activity that could indicate ransomware reading and locking many data files.
AIOps systems should adjust IT systems for maximum efficiency and effectiveness, taking into account everything from cost to flexibility.
AIOps tools must identify the tasks most suited for automation, ranging from simple alerting through to complete process automation to resolve a problem.
7. Analytics and alerting
AIOps systems should employ metric-based reporting and analysis, in which business outcomes are tied to relevant metrics, thus directly linking IT performance to business activity.
8. Platforms and environments
AIOps tools should be able work agnostically across cloud environments as well as on-premises. This reach should also extend to the IT development teams, and to the most popular development platforms for cloud-native and traditional application development.
9. Operational management
API integration is central to AIOps tools, enabling them to serve as a thin meta-data management layer linked to existing management tooling rather than requiring their replacement.
10. Compliance and privacy
AIOps tools should be able to understand when a system is out of compliance or an unusual event has occurred, thus reducing the time required to restore compliance. This includes tracking patch levels, the workloads running on a system and its criticality to the business.
What Are the Benefits and Drawbacks of AIOps?
AIOps can not only reduce the number of IT incidents and reduce the mean time to resolve (MTTR) but enable digital transformation by giving businesses a more agile, flexible and secure IT infrastructure.
An Enterprise Management Associates report based on a survey of more than 200 global IT professionals found the top business benefits of AIOps to be improved alignment between IT and the business, higher-quality IT and business services, improved experiences for customers and employees, improved business process efficiency, and accelerated digital transformation.
Forrester Research Inc. Analyst William McKeon-White said the most common enterprise goals for AI are improving visibility into IT infrastructure, preventing common issues such as peak-season system slowdowns, reducing the mean time to identify and resolve issues, and improving employee and customer experiences. IT managers polled by EMA also mentioned improvements to DevOps (which combines app development and IT operations to speed new software to the market,) increasing the effectiveness and reducing the cost of cloud operations, as well as operational efficiency, meeting service legal agreements (SLAs) and improved security.
AIOps can also free system administrators from routine work to resolve unusual or transient issues that can have a major business impact and to make long-needed but often-delayed improvements to system reliability and recovery, said McKeon-White.
But even customers who are highly pleased with AIOps note the high cost of the platforms and the time required to implement them. “Successful deployments require time and effort, including a structured road map by the end user,” according to the Gartner Market Guide for AIOps Platforms. “Implementations typically run into a number of problems, including data ingestion, providing contextually relevant analysis and long time to value…measured in months or years.”
Other challenges include the ability to analyse data from legacy systems as well as modern cloud-based platforms and translating technical insights into terms business managers can understand. Implementing AIOps can also require changes to processes and trigger “turf wars” with system and security administrators who don’t trust AIOps to manage critical systems or fear it is a threat to their jobs.
AIOps Strategy Tips
The first requirement for AIOps, said Forrester’s McKeon-White, is to ensure it has access to data from all the business and IT systems that are critical to the business. If the data feeding your AIOps solution “… only covers 20% of your environment, it will only be about 20% effective,” he said.
The EMA study also showed “a strong correlation between platform consolidation and AIOps success” because a single AIOps platform can best gather and analyze data from multiple systems within today’s complex applications. The organizations most successful at AIOps also had a strong commitment to using it to automate as many IT processes as possible, the study showed.
Industry observers recommend evaluating AIOps platforms for the types of AI and automation they support, their ease of integration with other monitoring and management platforms, and their ability to monitor and troubleshoot legacy as well as newer cloud-based applications.
Another important requirement is the ability to link AIOps findings and recommendations to business-level benefits such as fewer outages, improved business process efficiencies, reduction in incidents and improved employee or user satisfaction. AIOps platforms that can identify these links between IT components and business processes, and describe them in reports customized for business users, help drive adoption and funding.
Examples of AIOps
Omdia’s Illsley identified the following vendors in the analyst firm’s “Omdia Universe: Selecting an AIOps Solution, 2021-2022.” These vendors’ products address Omdia’s requirements of AIOps tools to varying degrees.
BMC has the most consistently high scores across functional areas, with a “wide and inclusive range of IT operational management capabilities.” BMC approaches the AIOps market with a goal of reducing the signal-to-noise ratio that confronts IT every day, increasing responsiveness around incidents and root cause analysis, Illsley said.
Illsley noted Digitate’s strength around automation, offering easy, out-of-the-box consumption to IT; and security operations, where it delivers a closed-loop solution for IT security and compliance requirements.
IBM’s entry in the AIOps space are its Cloud Pak for Watson AIOps and Instana (the latter acquired in 2020); these tools are complemented by offerings from the wider IBM portfolio. CloudPak for Watson AIOps was strongest around performance monitoring and platforms and environments.
LogicMonitor aims to help IT operations to become more predictive and prevent failures through automated actions. Illsley sees its top strengths as performance monitoring and optimization.
ServiceNow’s biggest strength is in performance monitoring in Illsley’s evaluation, propelled by ServiceNow’s IT service management and IT operational management capabilities, which allow it to gather and correlate data needed for performance monitoring. ServiceNow’s second strongest category was compliance and privacy; it can ensure continuous compliance and employs virtual agents and prebuilt GRC conversations, delivering instant resolution to common requests.
Splunk’s strongest area in Illsley’s evaluation was performance monitoring, with a range of capabilities including application monitoring and troubleshooting, infrastructure monitoring, digital experience monitoring, log investigation, service insight, event management, and incident response. Splunk also performs well in the data management category, with a real-time data platform that’s able to parse and filter log, event, metric and trace data as it is ingested and then forward it into a searchable index.
This tool scored highly for automation in Illsley’s evaluation, ahead of the other vendors that he looked at. The automation functionality enables users to build projects (covering a wide range of IT operational management activities) that contain multiple jobs, and each job can have a unique workflow while sharing the same access control list. PagerDuty is also strong in operational management, capable of offering recommendations for process improvement within several days or weeks.
StackState is focused on managing in a cloud-native world, aimed at delivering observability of the entire IT estate. Illsley found that its’ strongest capability is in incident analysis, noting integration with a range of third-party systems that supply data. The product is technology-agnostic, meaning that it incorporate data from any source using a software development kit (SDK).
Illsley notes that Sumo Logic is strongest in analytics and alerting; its analytics capabilities span time series, log and trace data and identify correlations between application, services, orchestrator and infrastructure layers of the modern application stack. This breadth enables the product to speed problem diagnosis and root cause analysis. It also performed well in collaboration, providing good functionality for developers, according to Illsley.
Illsley lauded VuNet Systems’ ability to link its findings to business outcomes. It “correlates the business transaction journey by bringing together the business context and IT environment across both infrastructure and applications,” Illsley said. It can also segregate resources by line of business, leading to more targeted analysis of resource usage. VuNet also performed well in analytics and alerting, with deep learning to baseline behavior of “golden signal” data, for probablistic forecasting.
Netreo offers vertical-specific versions of its AIOps tool, which is an advantage for it; most competitors take a broadly horizontal approach. Its top strengths are in performance monitoring and collaboration. It monitors KPIs like CPU, memory, disk and network bandwidth. It integrates with ServiceNow and other IT service management tools, and it provides APIs that enable operations centers to create customized dashboards and views without any SQL or scripting requirements.
Customers report that AI and machine learning is a valuable — and often essential — solution to the challenge of optimizing and securing complex mission-critical applications and services. While implementation and license costs for infrastructurewide coverage can seem daunting, such complete visibility yields the greatest benefits. Despite costs and delays, “There is no future of IT operations that does not include AIOps,” said the Gartner report. “This is due to the rapid growth in data volumes and pace of change … that cannot wait on humans to derive insights.”