Digital Human Brain Covered with Networks GETTY
AIOps, which is a term that was coined by Gartner in 2017, is increasingly becoming a critical part of next-generation IT. “In a nutshell, AIOps is applying cognitive computing like AI and Machine learning techniques to improve IT operations,” said Adnan Masood, who is the Chief Architect of AI & Machine Learning at UST Global. “This is not to be confused with the entirely different discipline of MLOps, which focuses on the Machine learning operationalization pipeline. AIOps refers to the spectrum of AI capabilities used to address IT operations challenges–for example, detecting outliers and anomalies in the operations data, identifying recurring issues, and applying self-identified solutions to proactively resolve the problem, such as by restarting the application pool, increasing storage or compute, or resetting the password for a locked-out user.”
The fact is that IT departments are often stretched and starved for resources. Traditional tools have usually been rule-based and inflexible, which has made it difficult to deal with the flood of new technologies.
“IT teams have adopted microservices, cloud providers, NoSQL databases, and various other engineering and architectural approaches to help support the demands their businesses are putting on them,” said Shekhar Vemuri, who is the CTO of Clairvoyant. “But in this rich, heterogeneous, distributed, complex world, it can be a challenge to stay on top of vast amounts of machine-generated data from all these monitoring, alerting and runtime systems. It can get extremely difficult to understand the interactions between various systems and the impact they are having on cost, SLAs, outages etc.”
So with AIOps, there is the potential for achieving scale and efficiencies. Such benefits can certainly move the needle for a company, especially as IT has become much more strategic.
“From our perspective, AIOps equips IT organizations with the tools to innovate and remain competitive in their industries, effectively managing infrastructure and empowering insights across increasingly complex hybrid and multi-cloud environments,” said Ross Ackerman, who is the NetApp Director of Analytics and Transformation. “This is accomplished through continuous risk assessments, predictive alerts, and automated case opening to help prevent problems before they occur. At NetApp, we’re benefiting from a continuously growing data lake that was established over a decade ago. It was initially used for reactive actions, but with the introduction of more advanced AI and ML, it has evolved to offer predictive and prescriptive insights and guidance. Ultimately, our capabilities have allowed us to save customers over two million hours of lost productivity due to avoided downtime.”
As with any new approach, though, AIOps does require much preparation, commitment and monitoring. Let’s face it, technologies like AI can be complex and finicky.
“The algorithms can take time to learn the environment, so organizations should seek out those AIOps solutions that also include auto-discovery and automated dependency mapping as these capabilities provide out-of-the-box benefits in terms of root-cause diagnosis, infrastructure visualization, and ensuring CMDBs are accurate and up-to-date,” said Vijay Kurkal, who is the CEO of Resolve. “These capabilities offer immediate value and instantaneous visibility into what’s happening under the hood, with machine learning and AI providing increasing richness and insights over time.”
As a result, there should be a clear-cut framework when it comes to AIOps. Here’s what Appen’s Alyssa Simpson Rochwerger recommends (she is the VP of Data and AI):
- Clear ability to measure product success (business value outcomes).
- Ability to measure and report on associated performance metrics such as accuracy, throughput, confidence and outcomes
- Technical infrastructure to support—including but not limited to—model training, hosting, management, versioning and logging
- Data Set management including traceability, data provenance and transparency
- Low confidence/fallback data handling (this could be either a data annotation or other human-in-the-loop process or default when the AI system can’t handle a task or has a low-confidence output)
All this requires a different mindset. It’s really about looking at things in terms of software application development.
“Most enterprise businesses are struggling with a wall to production, and need to start realizing a return on their machine learning and AI investments,” said Santiago Giraldo, who is a Senior Product Marketing Manager at Cloudera. “The problem here is two-fold. One issue is related to technology: Businesses must have a complete platform that unifies everything from data management to data science to production. This includes robust functionalities for deploying, serving, monitoring, and governing models. The second issue is mindset: Organizations need to adopt a production mindset and approach machine learning and AI holistically in everything from data practices to how the business consumes and uses the resulting predictions.”
So yes, AIOps is still early days and there will be lots of trial-and-error. But this approach is likely to be essential.
“While the transformative promise of AI has yet to materialize in many parts of the business, AIOps offers a proven, pragmatic path to improved service quality,” said Dave Wright, who is the Chief Innovation Officer at ServiceNow. “And since it requires little overhead, it’s a great pilot for other AI initiatives that have the potential to transform a business.”.