How to Automate Incident Response for Splunk Alerts in Minutes

Let’s talk about Splunk, a market leader in the Security Event Information Management (SEIM) market. BTW – You can always tell who the market leader is in any category when its competition starts touting itself as the ones who will eradicate that company. Recently, one of Splunk’s competitors described themselves as “Splunk Killers”, reaffirming that Splunk is indeed at the head of its class in that segment.

In Gartner’s 2018 Magic Quadrant for the SIEM market, Splunk appears higher than everyone else and further to the right than anybody but IBM. What this means is they excel above all other competitors on the y-axis of the Magic Quadrant, which is a measurement of “Ability to Execute.” On the x-axis of the Magic Quadrant, which measures “Completeness of Vision,” they exceed almost everyone except IBM.

Score highly on those two measurements, and Gartner considers you a Market Leader.

Market share is another key indicator of market leadership, and here Splunk is ranked No. 2 with 13.7% market share. Only IBM has a larger market share when it comes to SEIM’s.

Thanks to Splunk’s January 31, 2019 Form 10-K filing with the SEC, we also know they have 17,500 customers in more than 130 countries, including 90% of the Fortune 100. Another clear indication that they are a leader in this market.

With a market position like that, it seems worthwhile to talk about how to quickly and easily automate incident remediation for Splunk alerts in minutes.

As many people know, Splunk produces software for capturing, indexing, correlating, searching, monitoring, and analyzing machine-generated big data.

Some sources of that data include logs for Windows events, Web servers, and live applications, as well as network feeds, metrics, change monitoring, message queues, archive files, and so on.

Generally, these data sources can be categorized as:

  • Files and directories
  • Network events
  • Windows sources
  • And the catch-all category of “other sources”

There are a number of outputs and outcomes Splunk generates from this data, including:

  • Analyzing system performance
  • Troubleshooting failure conditions
  • Monitoring business metrics
  • Creating dashboards to visualize and analyze results
  • And of course storing and retrieving data for later use

That’s A LOT of data, and the more systems Splunk monitors, and the more those systems grow, the greater the volume of machine data that gets generated. This is becoming a problem because IT and security operations are getting inundated by all this data, and not just from Splunk, but other systems as well, though Splunk generates a big chunk of this.

Every time there’s an incident, an event, a threshold being crossed, etc. new data is generated, adding to the surge already flooding over IT and Security Operations. And it’s only getting worse.

Ultimately, it’s people who have to deal with all this data, and the problem is, (as we often say) people don’t scale very well.

Even the very best data center workers in NOCs and SOCs can only handle so much. At some point – and that point is pretty much right now – automation has to take on a greater share of the task burden all this growth in data is necessitating.

Why automation? Because people may not scale very well, but automation DOES! And if you’re in one of these overwhelmed data centers, that should be music to your ears.

Here are just a few of the ways automation can bring relief to NOCs and SOCs drowning in Splunk data:

Triggering Workflows

Let’s say there’s been an event detected of a corporate website being hacked and defaced. This event can trigger an automatic workflow that quickly restores a website to its pre-defacement state. In fact, an automation platform like Ayehu can do this MUCH quicker than humans could do manually once they got the alert. Restoring the website automatically and almost instantaneously minimizes the damage to corporate reputation, not to mention the threat to job security because the defacement happened in the first place.

Remediating Incidents

In addition to the example of remediating a website defacement incident, let’s consider a situation where Splunk generates an alert about a specific machine due to some observed suspicious activity. Ayehu can remotely lock it either automatically or at the SOC analyst’s manual command, to mitigate any damage until a hands-on inspection can take place. Furthermore, this automated incident remediation workflow could also include doing things like deactivating that user’s Active Directory credentials, turning off their card key’s ability to swipe in or out of a building, etc.

Data Enrichment

This task is well known to anyone who’s ever had to perform cybersecurity forensics during and after an incident. It involves aggregating all the information a SOC analyst needs to make an informed decision about what’s happening in real-time, or what happened as part of an after-incident evaluation. This can be a laborious manual task, and certainly one that’s difficult to script out.

If your automation platform easily integrates with just about anything in a typical, heterogeneous IT environment, however, then it can gather this critical information very rapidly as well as add more precise context to it about the nature of the incident. This will greatly reduce time-to-decision-making for SOC analysts, which is vital when, for example, you’re watching a ransomware virus swiftly encrypt your enterprise data and you need to decide on a course of action fast.

Opening Tickets

Just about every data center uses an ITSM platform like ServiceNow, JIRA, BMC Remedy, or one of many others. It’s very important to document what steps were taken to remediate an incident or conduct a cybersecurity forensics investigation. SOC analysts are pretty overwhelmed these days, and often don’t have the time to do that. When they do have time, they often don’t document as thoroughly as necessary in order to provide a complete picture of what transpired.

An automation tool like Ayehu can do this much quicker, and in real-time during workflow execution, so everything is properly documented, and nothing slips through the cracks.

Now let’s walk through the flow of events that uses Splunk data and alerts as triggers for actions.

We call this flow a closed-loop, automated incident management process. It starts out with Ayehu NG creating an integration between Splunk and whatever IT Service Management or help desk platform you’re using, be it ServiceNow, JIRA, BMC Remedy, etc.

When Splunk generates an alert or any kind of data you want to act upon, Ayehu intercepts it via the integration point. It will then parse it to determine the underlying incident, and launch the appropriate workflow for that situation, whether it be remediating that specific underlying incident, gathering information for forensic analysis, or whatever.

While this is taking place, Ayehu also automatically creates a ticket in your ITSM, and updates it in real-time by documenting every step of the workflow. Once the workflow is done executing, Ayehu automatically closes the ticket. All of this can occur without any human intervention, or you can choose to keep humans in the loop.

This closed-loop illustration also reveals why we think of Ayehu as a virtual operator, which we sometimes refer to as “Level 0 Tech Support”. Many incidents can simply be resolved automatically by Ayehu without human intervention, and without the need for attention from a Level 1 technician.

Imagine automating manual processes like Capture, Triage, Enrich, Respond, and Communicate. Automating resolution and remediation can result in a pretty significant savings of time, which can be particularly critical for data centers feeling overwhelmed.

Customers tell us over and over that automating the manual, tedious, time-intensive stuff accelerated their incident resolution by 90% or more.

We can also say with confidence that you can automate incident response for Splunk alerts in minutes, because Ayehu’s automation platform is agentless. Being agentless also makes us non-intrusive since we leverage API’s, SSH, and HTTPS behind enterprise firewalls under that organization’s security policy to perform automation. The only software to install is on a server, either physical or virtual, which centralizes management and greatly simplifies maintenance and upgrades.

Another reason it only takes minutes to automate incident response for Splunk alerts is because the Ayehu automation platform is codeless. This is something really important to consider because while there are many vendors out there touting their platforms as “automation”, the fact remains that they’re really just frameworks for scripting, and we steadfastly believe that scripting IS NOT automation.

For starters, in order to script you need to have programming expertise. With a true automation tool, however, you shouldn’t need to have any programming expertise. In fact, the automation platform should be so easy to use, even a junior SysAdmin with zero programming expertise should be able to master it in less than a day. Why is that so important? Because one of the promises of true automation is that you don’t have to rely on specialized talent to orchestrate activities in your environment. Requiring specialized programmers would be a bottle-neck to that goal.

Finally, the Ayehu automation platform includes AI and Machine Learning built into the product.

The first thing you should know about Ayehu’s AI and Machine Learning efforts is that we’re partnered with SRI International (SRI), formerly known as the Stanford Research Institute. For those not familiar, SRI does high-level research for government agencies, commercial organizations, and private foundations. They also license their technologies, form strategic partnerships (like the one they have with us), and create spin-off companies. They’ve received more than 4,000 patents and patent applications worldwide to date. SRI is our design partner, and they’ve designed the algorithms and other elements of our AI/ML functionality. What they’ve done so far is pretty cool, but what we’re working on going forward is really exciting.

Questions and Answers

Q:          What are the pros and cons of using general purpose bot engines compared to your solution?

A:           General purpose bot engines won’t actually perform the actions on your infrastructure, devices, monitoring tools, business applications, etc. All they can really do is ingest a request. By contrast, Ayehu not only ingests requests, but actually executes the necessary actions needed to fulfill those requests. This adds a virtual operator to your environment that’s available 24x7x365. Additionally, Ayehu is a vendor-agnostic tool that interfaces with MS-Teams, Skype, etc. to provide these general purpose chat tools with intelligent automation capabilities.

Q:          Do you have an on-premise solution?

A:           Yes. Ayehu can be installed on-premise, on a public or private cloud, or in a hybrid combination of all three.

Q:          Do you have voice integration?

A:           Ayehu integrates with Amazon Alexa, and now also offers Angie™, a voice-enabled Intelligent Virtual Support Agent for IT Service Desks.

Q:          If a user selects a wrong choice (clicks the wrong button) how does he or she fix it?

A:           It depends on how the workflow is designed. Breakpoints can be inserted in the workflow to ask the endpoint user to confirm their button selection, or go back to reselect. Ayehu also offers error-handling mechanisms within the workflow itself.

Q:          Does Ayehu provide orchestration capabilities or do you rely on a 3rd party orchestration tool?

A:           Ayehu IS an enterprise-grade orchestration tool, offering over 500 pre-built platform-specific activities that allow you to orchestrate multi-platform workflows from a single pane of glass.

Q:          Can you explain in a bit more detail on intent-based interactions?

A:           Intent is just that, what the user’s intent is when interacting with the Virtual Support Agent (VSA). For example, if a user types “Change my password”, the intent could be categorized as “Password Reset”. That would then trigger the “Password Reset” workflow.

Q:          Thanks for the information so far, great content! I would like to know if I can use machine learning from an external source, train my model, and let Ayehu query my external source for additional information?

A:           Yes. Ayehu can integrate with any external source or application, especially when it has an API for us to interface with.

Q:          Can I create new automations to my inhouse applications?

A:           Yes. Ayehu can integrate with any application bi-directionally. Once integrated with your inhouse applications, Ayehu can execute automated actions upon them.

Q:          Is there an auto form-filling feature? (which can fill in a form in an existing web application)

A:           Yes. Ayehu provides a self-service capability that will allow this.

Q:          How can I improve or check how my workflows are working and helping my employees to resolve their issues?

A:           Ayehu provides an audit trail and reporting that provides visibility into workflow performance. Additionally, reports are available on time saved, ROI, MTTR, etc. that can quantify the benefits of those workflows.

Q:          What happens when your VSA cannot help the end user?

A:           The workflow behind the VSA can be configured to escalate to a live support agent.

Q:          If there is a long list of choices – what options do you have? Dropdown?

A:           In addition to the buttons, dropdowns will be provided soon in Slack as well.

Q:          Did I understand correctly, an admin will need to create the questions and button responses? If so, is this a scripted Virtual Agent to manage routine questions?

A:           Ayehu is scriptless and codeless. The workflow behind the VSA is configured to mimic the actions of a live support agent, which requires you to pre-configure the questions and expected answers in a deterministic manner.

Q:          Is NLP/NLU dependent on IBM Watson to understand intent?

A:           Yes, and soon Ayehu will be providing its own NLP/NLU services.

Q:          Are you using machine learning for creating the conversations? Or do I have to use intents and entities along with the dialogs?

A:           Yes, you currently have to use intents and entities, but our road map includes using machine learning to provide suggestions that will improve the dialogs.

Q:          What are the other platforms that I can deploy the VSA apart from Slack?

A:           Microsoft Teams, Amazon Alexa, ServiceNow ConnectNow, LogMeIn, and any other chatbot using APIs.

This is a recap of a live Webinar we hosted in May 2019. To watch the on-demand recording and see this content in action, please click here.

New call-to-action

Creating an Intelligent Virtual Support Agent for your ITSM

A topic that’s really getting a lot of buzz these days is Virtual Support Agents.  Virtual Support Agents – or VSAs – are the next logical evolution of a chatbot, because where chatbots are primarily conversational, a VSA is both conversational and actionable, making them much more valuable to an enterprise, and particularly to a service desk using an ITSM platform.

In order to better understand why VSAs are so top-of-mind, it can be helpful to take a step back and understand what’s happening right now in IT operations at enterprises around the world, particularly with all the data they’re dealing with.

Do you know what a Zettabyte is? NO GOOGLING!

A Zettabyte is a trillion Gigabytes. That’s a “1” followed by 21 zeroes.

As humans, it can be hard for us to wrap our minds around numbers that large, so let’s use a visual metaphor to help provide a frame of reference for how much data 1 Zettabyte represents.

The visual metaphor we’d like you to envision should be a familiar one – grains of rice. For the sake of this visual comparison, let’s say one grain of rice is equal to one byte of data.

That makes a kilobyte, a thousand grains, equal to about a CUP of rice.

A megabyte of data would then be the equivalent of 8 BAGS of rice.

1 gigabyte of data in terms of rice is equal to 3 container TRUCKS.

A terabyte of data would then be the equivalent of 2 CONTAINER SHIPS full of nothing but rice.

Now get this – an exabyte of data in terms of rice, would cover all of MANHATTAN.

A petabyte of data, would just about cover all of TURKEY. (BTW – turkey & rice is a great combo!)

Finally, here’s what a Zettabyte of data in terms of rice would do.  It would fill up the PACIFIC OCEAN! 

Now, this last visual using the Pacific Ocean is very relevant, especially if you work in IT operations.  That’s because you literally feel like an entire ocean’s worth of data is inundating you these days, thanks to all the systems you’re maintaining that create, store, access and deliver data for your employees, customers, partners, etc.  Life in IT operations is a relentless tsunami of incidents, events, thresholds being crossed, etc., and it’s only getting worse.

How much worse?

In 2017 The Economist published a chart produced by IDC in conjunction with Bloomberg estimating the size of data comprising “The Digital Universe”. In 2013 there were 4.4 Zettabytes of data worldwide, but by 2020 there will be 44 Zettabytes. That’s an astounding CAGR of 47%. Don’t expect things to slow down though, because it’s estimated that by 2025 there will be 175 zettabytes of data worldwide!

Interestingly, IDC/Bloomberg discovered what appears to be a correlation between the exponential growth of data and an increase in the number of times companies are mentioning “artificial intelligence” in their earnings calls. This is probably not a coincidence, and it underscores something we say over and over at Ayehu – people don’t scale very well. 

Even the very best data center workers can only do so much. At some point, and that point is pretty much right now, end user self-service in concert with automation has got to take on a greater share of the service desk tasks all this growth in data is necessitating.

So when does it make the most sense for your end users to interface with Virtual Support Agents instead of live operators at the service desk? Well, one obvious answer to that might be as your organization’s first point of contact for L1 support issues.

We’re talking things like password resets, which Gartner estimates are responsible for as much as 40% of your service desk’s call volume.

Onboarding employees, and its closely related counterpart task offboarding employees, are also excellent tasks for a VSA.

How about VM provisioning and VM resizing?  These tasks lend themselves very well to a VSA interface.

Another good one is service restarts.  It makes a lot of sense to empower end-users to be able to do this themselves via a VSA.

And there are many, many more L1 support issues that would be great candidates for Virtual Support Agents.

But then what about your service desk?  What should your human agents do once the Virtual Support Agent has taken on all these tasks? 


Well, there are still probably a number of L1 support issues that are not well-suited for a VSA (at least not yet). The service desk can continue handling these.

Of course most, if not all of your L2 and L3 support issues are still best handled by a human service desk, for now.

And finally, vendor support issues are also probably still best managed by the service desk.

Still, deploying a Virtual Support Agent will clearly shift a lot of tedious, laborious tasks off of the service desk and free them up to do other things. But what kind of impact does this have on the cost structure of a service desk?

The average industry cost of handling an L1 support ticket is $22. In comparison, the average cost of a ticket handled by a Virtual Support Agent is just $2. Using these figures to do a back-of-the-envelope calculation of how many tickets can be off-loaded from the service desk to the VSA, will likely yield a significant reduction in support costs. What’s more, issues handled by VSAs often gets remediated much faster, resulting in greater end-user satisfaction than going through the service desk.

And if end users are going through a VSA instead of the service desk, then that means the VSA can reroute a trainload of L1 incident tickets away from the service desk, freeing up that staff to focus on more important, and more strategic things.  This is a huge value proposition!

What about the benefits VSAs provide to end-users?

Here’s an obvious one. Ask any end-user how much they love waiting on hold when they need an issue resolved, but no one’s available to help them. Waiting on hold can really degrade the user’s experience and perception of the service desk. With Virtual Support Agents, on the other hand, no one ever gets put on hold because the VSA is available 24/7/365. VSAs never take a break, or call in sick, or get temperamental due to mood swings. They’re always available and ready to perform on-demand.

Then there’s the mean-time-to-resolution metric, better known as MTTR. As every service desk knows, the speediness of their incident remediation outcomes is one of the major KPI’s they’re judged on.  Well when it comes to speedy MTTR, a VSA should be faster than a human being just about every single time.

Finally, does your enterprise have younger employees, particularly from the millennial generation? Of course it does! Just about every organization does, and some have them in far greater percentages than others. Well, guess what? This generation, raised on Facebook and mobile apps, generally prefers interfacing with technology as opposed to people. And that’s something we all know empirically because we’ve seen it for ourselves.

In addition to that visual proof, a survey conducted in 2018 by Acquire.io found that 40% of millennials said that they chat with chatbots on a daily basis! So, providing VSAs that empower younger workers with self-service capabilities might just give your organization a competitive advantage in attracting the best and brightest young talent to your company.

Questions & Answers

Q:          What are the pros and cons of using general purpose bot engines compared to your solution?

A:           General purpose bot engines won’t actually perform the actions on your infrastructure, devices, monitoring tools, business applications, etc. All they could really do is ingest a request. By contrast, Ayehu could not only ingest a request, but actually execute the necessary actions needed to fulfill that request. This adds a virtual operator to your environment that’s available 24x7x365. Additionally, Ayehu is a vendor-agnostic tool that is capable of interfacing with MS-Teams, Skype, etc. to provide a general purpose chat tool with intelligent automation capabilities.

Q:          Do you have on-premise solution?

A:           Yes.  Ayehu can be installed on-premise, on a public or private cloud, or in a hybrid combination of all three.

Q:          Do you have voice integration?

A:           Ayehu integrates with Amazon Alexa, and now also offers Angie™, a voice-enabled Intelligent Virtual Support Agent specifically designed for IT Service Desks.

Q:          If a user selects a wrong choice (clicks the wrong button) how does he or she fix it?

A:           It depends on how the workflow is designed. Breakpoints can be inserted in the workflow to ask the endpoint user to confirm their selection, or go back to reselect.  Ayehu also offers error-handling mechanisms within the workflow itself.

Q:          Does Ayehu provide orchestration capabilities or do you rely on a 3rd party orchestration tool?

A:           Ayehu IS an enterprise-grade orchestration tool, offering over 500 pre-built, platform-specific activities that allow you to orchestrate multi-platform workflows from a single pane of glass.

Q:          Can you please share the slide on IVA vs. ServiceDesk and elaborate a bit on the use cases?

A:           The entire PowerPoint file presented in this webinar can be found on SlideShare.

Q:          Can you explain in a bit more detail on intent-based interactions?

A:           Intent is just that: what the user’s intent is when interacting with the Virtual Support Agent (VSA). For example, if a user types “change my password”, the intent could be categorized as “password reset”. That would then automatically trigger the “password reset” workflow.

Q:          Can we use machine learning from an external source, train our model, and let Ayehu query our external source for additional information?

A:           Yes. Ayehu can integrate with any external source or application, especially when it has an API for us to interface with.

Q:          Can I create new automations to my in-house applications?

A:           Yes. Ayehu can integrate with any application bi-directionally. Once integrated with your in-house applications, Ayehu can execute automated actions upon them.

Q:          Is there an auto form-filling feature that can fill in a form in an existing web application?

A:           Yes. Ayehu provides a self-service capability that will enable this.

Q:          How can I improve or check how my workflows are working and helping my employees to resolve their issues?

A:           Ayehu provides an audit trail and reporting that provides visibility into workflow performance. Additionally, reports are available on time saved, ROI, MTTR, etc. that can quantify the benefits of those workflows.

Q:          What happens when your VSA cannot help the end-user?

A:           The workflow behind the VSA can be configured to escalate to a live support agent.

Q:          If there is a long list of choices, what options do you have? Dropdown?

A:           In addition to the buttons, dropdowns will be provided soon in Slack as well.

Q:          Did I understand correctly, an admin will need to create the questions and button responses? If so, is this a scripted Virtual Agent to manage routine questions?

A:           Ayehu is scriptless and codeless. The workflow behind the VSA is configured to mimic the actions of a live support agent, which requires you to pre-configure the questions and expected answers in a deterministic manner.

Q:          is NLP/NLU dependent on an IBM Watson to understand intent?

A:           Yes, and soon Ayehu will be providing its own NLP/NLU services.

Q:          Are you using machine learning for creating the conversations? Or do we have to use intents and entities along with the dialogs?

A:           Yes, you currently have to use intents and entities, but our road map includes using machine learning to provide suggestions that will improve the dialogs.

Q:          What are the other platforms from which I can deploy the VSA, apart from Slack?

A:           Microsoft Teams, Amazon Alexa, ServiceNow, ConnectNow, LogMeIn, and any other chatbot using APIs.

Missed the live Webinar? Watch it on-demand and see the above in action by clicking here.

New call-to-action