Automating Remediation of Splunk Alerts with Ayehu

Automating Remediation of Splunk Alerts with Ayehu

Author: Guy Nadivi

Many of our customers use Splunk, the market leader in their space. Due to the large volume of alerts Splunk generates, we often get asked how Ayehu can help offset some of the laborious manual work involved in remediating those alerts. We’re going to answer that question with a great use case many of you will find very familiar – freeing up disk space on a server (with a slight twist).

Remediating low disk space is on our list of top 10 highest value automation use cases. Ayehu can automate the toil out of that particular process using a combination of Splunk, ServiceNow, Slack, and Ayehu NG.

Let’s talk a bit about Splunk. It will come as no surprise to most of you that Splunk continues to be a market leader in its category. Here’s Gartner’s 2020 Magic Quadrant for the SIEM market showing Splunk just edging out IBM as the highest entry in that upper rightmost LEADERS square.

Just in case it’s difficult to read, the y-axis where they’re higher than everyone is a measurement of Ability to Execute.

The x-axis measures vendors by their Completeness of Vision, and Splunk’s doing pretty good on that metric as well.

This is clearly one reason Splunk is viewed as a market leader.

Being a market leader often translates into higher market share. Not surprisingly, Splunk is now #1 in market share with 16.5%. They recently dethroned IBM which is #2 with 13.2%. And rounding out the top 3 is Microsoft with 8.4%.

As of the end of Fiscal Year 2019, Splunk reports 19,400 customers.

According to Gartner, Splunk has an astounding 30.4% growth rate.

And 92 of the Fortune 100 are Splunk customers.

The reason Splunk is doing so well, as a lot of you already know, is because they’re great with machine data.

Splunk captures data, from logs, web servers, and lots of other places. Then it indexes that data to facilitate flexible searching and fast data retrieval. Splunk can then begin to correlate that data, which will often reveal relationships between seemingly unrelated events, and help accelerate root cause analysis. Splunk can also visualize this data into dashboards, graphs, and other outputs.

However, the biggest output from Splunk that most people in IT operations are probably familiar with is the alerts. Boy, oh boy can Splunk generate a lot of alerts!

And you know what that often leads to? Alert fatigue. Let’s face it, prior to the pandemic your service desk was already pretty overwhelmed. Now with the added burden of everyone working from home, they’re having a hard time keeping up.

Just how serious is alert fatigue? I’m going to address that with this brief quote:

‘There are too many security alerts coming in, and not enough people and time to deal with them all. In fact, approximately 64% of security tickets generated per day are not being worked. Let that sink in. The majority of security alerts received by security teams are not being analyzed and resolved. This is the essence of “alert fatigue”.’

And who is that quote from? Splunk themselves. They posted those exact words on their website earlier this year (Splunk Blogs – January 17, 2020).

Now this is a quote specifically about security tickets, but everyone knows it’s the exact same story in network operations where you have alerts flying at you from every direction 24×7.

The solution to alert fatigue, and really the solution to freeing up people from a lot of the laborious, repetitive, predictable tasks that comprise so much of IT operations, is automation.

Automation is going to:

  • Deflect tickets away from your service desk, which in turn allows technicians to focus on higher value projects
  • Reduce and/or eliminate errors which has the added benefit of reducing and/or eliminating rework, an often overlooked but significant drain on resources
  • Save time and money for the service desk, the IT department, and ultimately your organization
  • Almost certainly increase IT’s customer satisfaction scores, which is becoming an increasingly important KPI, in many cases linked directly to individual bonus compensation

BTW, many of you I’m sure are familiar with PwC also known as PricewaterhouseCoopers. They’re one of the Big Four accounting firms and 2nd largest professional services network in the world. Since March of 2020, they’ve been regularly surveying CFOs around the globe to track their sentiments in response to the COVID-19 crisis. In their most recent survey, during the weeks of June 1 and June 8 they asked 989 CFOs from 23 countries or territories around the world about their top priorities going forward.

The response from the CFOs was that “…50% report they plan to accelerate automation and new ways of working.“

So that’s the direction things are going in – automation. Actually, many of you know firsthand it was already going in that direction, but COVID-19 has unexpectedly expedited things.

Speaking of automation, Ayehu doesn’t just automate activities in Network Operations Centers.

Many of our customers use the Ayehu NG platform to also automate activities in their Security Operations Centers.

That makes sense, right? Splunk can send an alert notifying you about low disk space on a network drive, and Splunk can also send an alert that a ransomware attack is underway on a server. In both cases, that alert can come to Ayehu NG, where you can run an automated workflow, or playbook if you prefer, that automates the remediation response.

In fact, when it comes to security, many of the attacks themselves are automated, and there’s simply no way humans can respond quickly enough.

So if the attack is automated, shouldn’t the response to defend against it be automated too?

It should be, and you can automate all these kinds of things for both domains from a single pane of glass with Ayehu NG.

If you’re interested in test driving Ayehu NG and reducing alert fatigue in your organization, please visit our website and click here to download your very own free 30-day trial version today.