Slash MTTR with Intelligent Automation for AIOps

Slash MTTR with Intelligent Automation for AIOps

Author: Guy Nadivi

There seems to be confusion in the marketplace about the term “AIOps” as far as what it means exactly, but there’s much less confusion about what it can do – Improve IT’s customer satisfaction scores by reducing noise, lowering call volume to the service desk, and slashing MTTR.

These are the types of benefits every IT organization is demanding, and the good news is they’re attainable right now.

Ayehu has partnered with Edge Technologies to show you a vision of what that looks like, and give you a glimpse at the promise that AIOps can bring to your IT organization.

Many of you have worked for years in the IT Operations and Systems Management space. Some of you may recall that in the mid-‘90s, Enterprise Systems Management and Business Service Management (or BSM for short) emerged as new disciplines that would bring together distributed systems and mainframes into a single pane of glass to solve problems. As you may know, Gartner killed off the BSM category in 2016 because vendors failed to deliver on these promised benefits.

In many large enterprises, the picture today still remains the same. Does this scene look familiar to you?

The CIO is still asking “why has customer experience dropped for our core service?.  The IT Ops Manager is unsure what the cause might be as everything looks good thanks to fantastic ”monitoring”.  And the SRE can’t make sense of any of the screens because he/she is suffering from information overload and isn’t sure where to look. No wonder MTTR is high!

Even with today’s AIOps vendors, and a market where new ones seem to be entering the space every week, the promise of universal views into your operations remains elusive.  Nevertheless, it’s still a highly sought-after goal.

So the question is, what is preventing progress towards that goal?

Today, we still continue seeing knowledge and visibility silos across the enterprise from business units, support, operations, and engineering functions all the way through to 3rd-party service providers.

This is one of the main challenges to overcome if AIOps is going to succeed. Internal politics, tool proliferation, and un-integrated workflows continue contributing to the slow adoption of AIOps.

Sound familiar?

The promise of a “single pane of glass” never materialized leaving teams to use point products with limited integration and different data formats. The result?  A huge and costly inventory of tools to manage and operate leading to more frustration.

It’s widely accepted across the industry that most monitoring dashboards today fail to provide required operational views that business needs.

AIOps aims to fully automate IT Operations workflows, but the reality today is that enterprises still struggle with tool sprawl resulting in the “swivel chair” effect. Your triage and remediation workflows are still very much reactive in nature, but the goal is to prevent incidents from happening in the first place as much as possible, right?

Also, in our experience over the years, the tools used today are more than likely to be replaced at some point, so the best approach is to have a vendor agnostic data visualization and integration solution for your dashboarding needs. The tools supplying the dashboard data feeds will come and go.  Replacing them is a simple configuration change in Edge.

In order to break the knowledge and visibility silo challenges and create intelligent operations dashboards for increased AIOps adoption, think of the process in three parts:

Part 1:   Integrate all required data sources ranging from customer experience and your enterprise IT domains to give business and service health views by role. For example, executive, manager, and analyst views.

Part 2:   Integrate your existing event management, monitoring, and IT service management tools at the data and web layers to maximize your existing tool investments, skills, and standard operating procedures to become more proactive than ever before.

Part 3:   Integrate your process automation tools (such as Ayehu) to create convenient and frictionless workflows that can be executed in either attended or unattended mode.

Now that we better understand the problems and obstacles in the way of making progress, let’s walk through the process of creating ideal intelligent operations dashboards for your AIOps initiatives by uniquely combining your data and tools into role-based views of your business and services.

When we think about digital transformation and the outcomes businesses are looking for, one of the goals CIOs have longed to achieve is ensuring that business and enterprise IT are completely aligned. This has been a goal for as long as most of us can remember!

To reflect that in our intelligent operations dashboards, let’s start from the top-level (see graphic below),which is a set of first-level business, customer, and end-user experience (EUE) dashboards that appeal to all levels of the organization.

The second level is a triage dashboard, designed to allow teams to quickly identify whether the server, network or application layer is the source of an outage or service health issue.

The third level is a dependency-mapping dashboard that links application, network, and server infrastructure together in topology views to understand the business impact.

The fourth level is individualized dashboards specifically designed for teams and dedicated roles — application, infrastructure, and network monitoring dashboards.This level of dashboard is where SMEs can directly access your existing best-in-class tools using Edge’s unique web UI proxying capability.

The fifth level gives you access to your raw data including logs, events, packet traces, and call stack traces for example —so that detailed analysis can be performed in context to the issue being investigated.

By combining your data sources and tools into universal views using a single platform like Edge, you can provide appropriate dashboards to your executives, management, and SMEs that provide them access to the content they need and tasks they need to perform to be successful in their daily jobs.

By combining business and related service health metrics along with the power of integration with your data and tools, you can rapidly identify root cause, fix the problem for good, and slash your key performance indicators such as MTTR. Many Edge customers report having happier customers, greater alignment between business and IT, eradication of visibility silos, and overall better decision making and outcomes from their deployment.

Not least of all, their most valuable assets (people) are more successful in meeting their goals and performing their job tasks.

Now let’s talk a bit about automation.

Digital Transformation is a buzzword you hear a lot about these days.  It doesn’t have one standard definition but can basically be understood to mean the collection of technology, process, and even cultural disruptions an organization adopts to maximize its competitiveness in the 4th industrial revolution.

Those technology disruptions can include things like cloud computing, artificial intelligence, chatbots, and of course automation.

The process disruptions include things like Agile or Six Sigma, and a cultural disruption might be something like repositioning the organization’s focus to be better aligned with the customer journey.

For IT departments, digital transformation ultimately boils down to optimizing and accelerating delivery of computing services, regardless of whether the customer is external or internal.

When it comes to incident monitoring, one thing an IT department can do as part of its digital transformation, is to consolidate the visualization of all their various monitoring tools into a single pane of glass, as Edge Technologies enables.  A unified dashboard providing a 360° view of operations, can also provide an extraordinary opportunity to not only centralize incident monitoring but also to automate incident remediation.  That represents a big step forward in the digital transformation of data centers, and a perfect example of how 1 plus 1 can sometimes equal 3.

A recent paper published by Gartner (ID G00390283 – October 9, 2019) advised its readers that an ideal performance monitoring dashboard framework must aim to “Provide for the rapid triage and remediation of performance issues…”.

No argument there. Ayehu and Edge Technologies agree that combining automation with performance monitoring is central to an ideal dashboard framework.  But perhaps the most important word to emphasize in Gartner’s recommendation is “rapid”.

Unfortunately, “rapid” is not an adjective that the vast majority of service desks can use to describe their MTTR today.

MetricNet, the IT consulting firm that publishes benchmarks, performance metrics, and scorecards for a variety of IT-related activities, claims that the average incident MTTR is 8.40 business hours.  If you’re an end user in an organization who just submitted a ticket to the help desk, you do NOT want to hear that it will take an average of 8.40 business hours to remediate your issue.  On the contrary, you want to know that your IT department is doing everything it can to expedite a resolution for your incident, before it starts hampering your personal productivity.

When it comes to MTTR, your mileage may vary of course, depending on your IT organization’s ticket backlog, user population density, and complexity of tickets handled.

Regardless though, one universal factor that’s slowing down almost all IT organizations is the ever-increasing user demand for IT services, which often leads to growing system complexity in your environment to accommodate that growth, and ultimately results in ever increasing pressure on your staff to keep up. 

However, people don’t scale very well.  Even the very best data center workers can only do so much.  At some point, and that point is pretty much right now, automation has got to do more and more of the repetitive, tedious, laborious tasks all this growth in demand for services and increased system complexity is creating.

That’s why consolidating visualization of all your monitoring tools into a single pane of glass and incorporating automated incident remediation into that dashboard, can give your IT department the critical boost it needs to overcome the lack of human scalability.

If you’re interested in test driving Ayehu NG v1.6 with all its cool new features, download your very own free 30-day trial version from the link below:

https://info.ayehu.com/download-free-30-day-trial-ng