Posts

IT Incidents: From Alert to Remediation in 15 seconds [Webinar Recap]

Author: Guy Nadivi

Remediating IT incidents in just seconds after receiving an alert isn’t just a good performance goal to strive for. Rapid remediation might also be critical to reducing and even mitigating downtime. That’s important, because the cost of downtime to an enterprise can be scary. Even scarier though is what can happen to people’s jobs if they’re found to be responsible for failing to prevent the incidents that resulted in those downtimes.

So let’s talk a bit about how automation can help you avoid situations that imperil your organization, and possibly your career.

Mean Time to Resolution (MTTR) is a foundational KPI for just about every organization. If someone asked you “On average, how long does it take your organization to remediate IT Incidents after an alert?” what would your answer be from the choices below?

  • Less than 5 minutes
  • 5 – 15 minutes
  • As much as an hour
  • More than an hour

In an informal poll during a webinar, here’s how our audience responded:

More than half said that, on average, it takes them more than an hour to remediate IT incidents after an alert. That’s in line with research by MetricNet, a provider of benchmarks, performance metrics, scorecards and business data to Information Technology and Call Center Professionals.

Their global benchmarking database shows that the average incident MTTR is 8.40 business hours, but ranges widely, from a high of 33.67 hours to a low of 0.67 hours (shown below in the little tabular inset to the right). This wide variation is driven by several factors including ticket backlog, user population density, and the complexity of tickets handled.

Your mileage may vary, but obviously, it’s taking most organizations far longer than 15 seconds to remediate their incidents.

If that incident needing remediation involves a server outage, then the longer it takes to bring the server back up, the more it’s going to cost the organization.

Statista recently calculated the cost of enterprise server downtime, and what they found makes the phrase “time is money” seem like an understatement. According to Statista’s research, 60% of organizations worldwide reported that the average cost PER HOUR of enterprise server downtime was anywhere from $301,000 to $2 million!

With server downtime being so expensive, Gartner has some interesting data points to share on that issue (ID G00377088 – April 9, 2019).

First off, they report receiving over 650 client inquires between 2017 and 2019 on this topic, and we’re still not done with 2019. So clearly this is a topic that’s top-of-mind with C-suite executives.

Secondly, they state that through 2021, just 2 years from now, 65% of Infrastructure and Operations leaders will underinvest in their availability and recovery needs because they use estimated cost-of-downtime metrics.

As it turns out, Ayehu can help you get a more accurate estimate of your downtime costs so they’re not underestimated.

In our eBook titled “How to Measure IT Process Automation ROI”, there’s a specific formula for calculating the cost of downtime. The eBook is free to download on our website, and also includes access to all of our ROI formulas, which are fairly straightforward to calculate.

Let’s look at another data point about outages, this one from the Uptime Institute’s 2019 Annual Data Center Survey Results. They report that “Outages continue to cause significant problems for operators. Just over a third (34%) of all respondents had an outage or severe IT service degradation in the past year, while half (50%) had an outage or severe IT service degradation in the past three years.”

So if you were thinking painful outages only happen at your organization, think again. They’re happening everywhere. And as the research from Statista emphasized, when outages hit, it’s usually very expensive.

The Uptime Institute has an even more alarming statistic they’ve published.

They’ve found that more than 70% of all data center outages are caused by human error and not by a fault in the infrastructure design!

Let’s pause for a moment to ponder that. In 70% of cases, all it took to bring today’s most powerful high-tech to its knees was a person making an honest mistake.

That’s actually not too surprising though, is it? All of us have mistyped a keyboard stroke here or made an erroneous mouse click there. How many times has it happened that someone absent-mindedly pressed “Reply All” to an email meant for one person, then realized with horror that their message just went out to the entire organization?

So mistakes happen to everyone, and that includes data center operators. And unfortunately, when they make a mistake that leads to an outage, the consequences can be catastrophic.

One well-known example of an honest human mistake that led to a spectacular outage occurred back in late February of 2017. Someone on Amazon’s S3 team input a command incorrectly that led to the entire Amazon Simple Storage Service being taken down, which impacted 150,000 organizations and led to many millions of dollars in losses.

If infrastructure design usually isn’t the issue, and 70% of the time outages are a direct result of human error, then logic suggests that the key would be to eliminate the potential for human error. And just to emphasize the nuance of this point, we’re NOT advocating eliminating humans, but eliminating the potential for human error while keeping humans very much involved. How do we do that?

Well, you won’t be too surprised to learn we do it through automation.

Let’s start by taking a look at the typical infrastructure and operations troubleshooting process.

This process should look pretty familiar to you.

In general, many organizations (including large ones) do most of these phases manually. The problem with that is that it makes every phase of this process vulnerable to human error.

There’s a better way, however. It involves automating much of this process, which can reduce the time it takes to remediate an IT incident down to seconds. And automation isn’t just faster, it also eliminates the potential for human error, which should radically reduce the likelihood that your environment will experience an outage due to human error.

Here’s how that would work. It involves using the Ayehu platform as an integration hub in your environment. Ayehu would then connect to every system that needs to be interacted with when remediating an incident.

For example, if your environment has a monitoring system like SolarWinds, Big Panda, or Microsoft System Center, that’s where an incident will be detected first. The monitoring system (now integrated with Ayehu) will generate an alert which Ayehu will instantaneously intercept. (BTW – if there’s a monitoring system or any kind of platform in your environment that we don’t have an off-the-shelf integration for, it’s usually still pretty easy to connect to it via a REST API call.)

Ayehu will then parse that alert to determine what the underlying incident is, and launch an automated workflow to remediate it.

As a first step in our workflow we’re going to automatically create a ticket in ServiceNow, BMC Remedy, JIRA, or any ITSM platform you prefer. Here again is where automation really shines over taking the manual approach, because letting the workflow handle the documentation will ensure that it gets done in a timely manner (in fact, in real-time) and that it gets done thoroughly. This brings relief to service desk staff who often don’t have the time or the patience to document every aspect of a resolution properly because they’re under such a heavy workload.

The next step, and actually this can be at any step within that workflow, is pausing its execution to notify and seek human approval for continuation. To illustrate why you might do this, let’s say that a workflow got triggered because SolarWinds generated an alert that a server dropped below 10% free disk space. The workflow could then go and delete a bunch of temp files, it could compress a bunch of log files and move them somewhere else, and do all sorts of other things to free up space. Before it does any of that though, the workflow can be configured to require human approval for any of those steps.

The human can either grant or deny approval so the workflow can continue on, and that decision can be delivered via laptop, smartphone, email, instant messenger, or even regular telephone. However, please note that this notification/approval phase is entirely optional. You can also choose to put the workflow on autopilot and proceed without any human intervention. It’s all up to you, and either option is easy to implement.

Then the workflow can begin remediating the incident which triggered the alert.

As the remediation is taking place, Ayehu can update the service desk ticket in real-time by documenting every step of the incident remediation process.

Once the incident remediation is completed, Ayehu can automatically close the ticket.

Finally, Ayehu can go back into the monitoring system and automatically dismiss the alert that triggered the entire process.

This, by the way, illustrates why we think of Ayehu as a virtual operator which we sometimes refer to as “Level 0 Tech Support”. A lot of incidents can be resolved automatically by Ayehu without any human intervention, and thus without the need for attention from a Level 1 technician.

This then is how you can go from alert to remediation in 15 seconds, while simultaneously eliminating the potential for human error that can lead to outages in your environment.

Gartner concurs with this approach.

In a recently refreshed paper they published (ID G00336149 – April 11, 2019) one of their Vice-Presidents wrote that “The intricacy of access layer network decisions and the aggravation of end-user downtime are more than IT organizations can handle. Infrastructure and operations leaders must implement automation and artificial intelligence solutions to reduce mundane tasks and lost productivity.”

No ambiguity there.

Gartner’s advice is a good opportunity for me to segue into one last topic – artificial intelligence.

The Ayehu platform has AI built-in, and it’s one of the reasons you’ll be able to not only quickly remediate your IT incidents, but also quickly build the workflows that will do that remediation.

Ayehu is partnered with SRI International (SRI), formerly known as the Stanford Research Institute. In case you’re not familiar with them, SRI does high-level research for government agencies, commercial organizations, and private foundations. They also license their technologies, form strategic partnerships (like the one they have with us) and creates spin-off companies. They’ve received more than 4,000 patents and patent applications worldwide to date. SRI is our design partner, and they’ve designed the algorithms and other elements of our AI/ML functionality. What they’ve done so far is pretty cool, but what we’re working on going forward is what’s really exciting.

One of the ways Ayehu implements AI is through VSAs, which is shorthand for “Virtual Support Agents”.

VSA’s differ from chatbots in that they’re not only conversational, but more importantly they’re also actionable. That makes them the next logical step or evolution up from a chatbot. However, in order for a VSA to execute actionable tasks and be functionally useful, it has to be plugged in to an enterprise grade automation platform that can carry out a user’s request intelligently.

We deliver a lot of our VSA functionality through Slack, and we also have integrations with Alexa and IBM Watson. We’re also incorporating an MS-Teams interface, and looking into others as well.

How is this relevant to remediating incidents?

Well, if a service desk can offload a larger portion of its tickets to VSA’s, and provide its users with more of a self-service modality, then that frees up the service desk staff to automate more of the kinds of data center tasks that are tedious, repetitive, and prone to human error. And as I’ve previously stated, eliminating the potential for human error is key to reducing the likelihood of outages.

Speaking of tickets, another informal webinar poll we conducted asked:

On average, how many support tickets per month does your IT organization deal with?

  • Less than 100
  • 101 – 250
  • 251 – 1,000
  • More than 1,000

Here’s how our audience responded:

Nearly 90% receive 251 or more tickets per month. Over half get more than 1,000!

For comparison, the Zendesk Benchmark reports that among their customers, the average is 777 tickets per month.

Given the volume of tickets received per month, the current average duration it takes to remediate an incident, and most importantly the onerous cost of downtime, automation can go a long way towards helping service desks maximize their efficiency by being a force multiplier for existing staff.

Q:          What types of notifications can the VSA send at the time of incident?

A:           Notifications can be delivered either as text or speech.

Q:          How does the Ayehu tool differ from other leading RPA tools available on the market?

A:           RPA tools are typically doing screen automation with an agent. Ayehu’s automation is an agentless platform that primarily interfaces with backend APIs.

Q:          Do we have to do API programming or other scripting as a part of implementation?

A:           No. Ayehu’s out-of-the-box integrations typically only require a few configuration parameters.

Q:          Do we have an option to create custom activities? If so, which programing language should be used?

A:           In our roadmap, we will be offering the ability to create custom activity content out-of-the-box.

Q:          Do out-of-the-box workflows work on all types of operating systems?

A:           Yes. You just define the type of operating system within the workflow.

Q:          How does Ayehu connect and authenticate with various endpoint devices (e.g. Windows, UNIX, network devices, etc.)? Is it password-less, connection through a password vault, etc?

A:           That depends on what type of authentication is required internally by the organization. Ayehu integrated with the CyberArk password vault can be leveraged when privileged account credentials are involved. Any type of user credential information that is manually input into a workflow or device is encrypted within Ayehu’s database. Also, certificates on SSH commands, Windows authentication, and localized authentication are all accessible out-of-the-box. Please contact us for questions about security scenarios specific to your environment.

Q:          What are all the possible modes that VSAs can interact with End Users?

A:           Text, Text-to-Speech, and Buttons.

Q:          Can we create role-based access for Ayehu?

A:           Yes. That’s a standard function which can also be controlled by and synchronized with Active Directory groups out-of-the-box.

Q:          Apart from incident tickets, does Aheyu operate on request tickets (e.g. on-demand access management, software requests from end-users, etc.)?

A:           Yes. The integration packs we offer for ServiceNow, JIRA, BMC Remedy, etc. all provide this capability for both standard and custom forms.

Q:          Does Ayehu provide APIs for an integration that’s not available out of the box?

A:           Yes. There are two options. You can either forward an event to Ayehu using our webservice which is based on a RESTful API, or from within the workflow you can send messages outbound that are either scheduled or event-driven. This allows you to do things such as make a database call, set an SNMP trap, handling SYSLOG messages, etc.

Q:          Does Ayehu provide any learning portal for developers to learn how to use the tool?

A:           Yes. The Ayehu Automation Academy is an online Learning Management System we just launched recently. It includes exams that provide you an opportunity to bolster your professional credentials by earning a certification. If you’re looking to advance your organization’s move to an automated future, as well as your career prospects, be sure to check out the Academy.

Q:          Does Ayehu identify issues like a monitoring tool does?

A:           Ayehu is not a monitoring tool like Solarwinds, Big Panda, etc. Once Ayehu receives an alert from one of those monitoring systems, it can trigger a workflow that remediates the underlying incident which generated that alert.

Q:          We have 7 different monitoring systems in our environment. Can Ayehu accept alerts from all of them simultaneously?

A:           Yes. Ayehu’s integrations are independent of one another, and it can also accept alerts from webservices. We have numerous deployments where thousands of alerts are received from a variety of sources and Ayehu can scale up to handle them all.

Q:          What does the AI in Ayehu do?

A:           There are different areas where AI is used. From use in understanding intent through chatbots to workflow design recommendations, and also suggesting workflows to remediate events through the Ayehu Brain service. Please contact an account executive to learn more.

New call-to-action

Leveraging IT Process Automation to Manage Blackouts

How IT process automation can help to manage blackoutsPerhaps there is no greater burden on the minds of IT professionals than the thought of an impending blackout. System outages can cost businesses a lot of money and impact service levels, both internally and externally. To be properly prepared to handle such an event, IT must have a plan in place that will allow them to act swiftly and, if possible, proactively to prevent or limit damages. IT process automation can help close the gap on such a plan.

What is a blackout?

In simplest of terms, a blackout is an event that takes down systems for either an emergency or scheduled maintenance. In emergency situations, there is little time to plan ahead, making a blackout much more dangerous and time-sensitive. Blackouts for scheduled maintenance, on the other hand, although still requiring prompt attention, are less of a threat since they can be well planned out and carefully executed.

Blackouts can be defined for one target in particular, multiple targets or all targets. Depending on need, scheduled blackouts can be planned well in advance, and can be set to run indefinitely, for a specified time period or just on an as-needed basis. Blackout periods can be extended or shortened mid-stream if necessary and the results are typically assessed by the IT team immediately after the systems are brought back up.

How can blackouts cause a problem?

In the event of an emergency blackout, or one in which the administrator inadvertently performs maintenance without executing a scheduled blackout, the target downtime can impact availability records. Unscheduled and even scheduled down time can impact business function across all departments, and even affect the organization’s bottom line if not handled properly. The key is to find a way to manage these blackouts in a way that is most efficient to limit down time and reduce impact on availability.

How can ITPA help manage blackouts more efficiently?

Because it allows for the systematic automation of routine tasks, ITPA is the perfect solution for managing both scheduled and emergency blackouts. For planned outages, the tool can be customized and defined to trigger the blackout one step at a time at the specified time or interval. This is particularly helpful for routine, repetitive outages for regular system maintenance (say, maintenance performed once a month on the 1st or last day). By leveraging technology to handle these routine tasks, the IT department can focus on more important matters.

For those instances when a blackout is scheduled but is not necessarily “routine”, ITPA can still be used in conjunction with human intervention. The automation tool can be programmed to send out notifications or stop at certain intervals and wait for input or instruction from the appropriate party.

Where automation really shines is in the event of an emergency blackout. ITPA makes the monitoring and notification of system events simple and effortless. In fact, in many cases, critical incidents can be detected before they cause any problems for the end-user, allowing the IT department to be proactive about managing the problem immediately. This can even sometimes eliminate the need for an emergency blackout altogether, or at the very least create the opportunity for IT to schedule and plan the outage.

In IT, blackouts are never completely avoidable. ITPA can help manage the process more seamlessly, whether it’s a planned outage or something unexpected.

To learn more about automation and how it can help your business manage blackouts, click here or start your free trial today.





How to Get Critical Systems Back Online in Minutes




Webinar: Streamline IT Incident Management, January 20

JOINT WEBINAR BY MIR3 AND AYEHU REVEALS HOW AUTOMATION CAN STREAMLINE IT INCIDENT MANAGEMENT

Webinar shows how prominent organizations today are using automated, closed-loop systems to enhance IT incident management

For many IT operations, the flow between incidents, alerts, and remediation is fragmented and involves time-consuming manual activities. In this webinar we’ll show how organizations today are using closed-loop systems with automated notification to streamline IT incident management and increase efficiency.

The webinar takes place on Wednesday, January 20 at 11:00am PT (2:00pm ET). Registration is required, though there is no charge to attend.

The speakers will explore a scenario where an IT system alert is intercepted by an intelligent, automated tool that parses it, and then launches a workflow that finds on-call team members or SMEs to address the alert. That same workflow can evaluate the incident and determine which stakeholders, customers or management resources should automatically receive an informational message.

The recipient can be offered options to trigger automated remediation steps, or standard forensic steps can be executed automatically to immediately begin the resolution process. At the same time, a help desk ticket can be generated and automatically updated every step of the way; once remediated, the ticket is closed by the system, and the alert is dismissed.

The presenters will introduce actual case studies of prominent organizations that have streamlined their IT incident management with automation.

The webinar will be presented by Janice Hight, senior director of sales engineering for MIR3, and Guy Nadivi, director of business development at Ayehu Inc.

About MIR3
MIR3, Inc. is the leading developer of Intelligent Notification and response software, which helps organizations enhance communication, protect assets, and increase operational efficiency. MIR3 technology enables advanced rapid, two-way communication for IT, business continuity, and enterprise operations for many of the Global Fortune 100 companies, as well as government entities, universities and companies of all sizes in more than 130 countries. For more information, visit www.mir3.com. Follow MIR3 on Twitter: @MIR3.

About Ayehu

Ayehu provides IT Process Automation solutions for IT and Security professionals to identify and resolve critical incidents, simplify complex workflows, and maintain greater control over IT infrastructure through automation. Ayehu solutions have been deployed by major enterprises worldwide, and currently support thousands of IT processes across the globe. The company has offices in New York and Tel Aviv, Israel. For more information please visit www.ayehu.com

Webinar: Streamline IT Incident Management, January 20

How Automated Cyber Security Incident Response can Protect Government Infrastructures

Cyber Security Incident Response AutomationWhen the term security breach is used, many of us envision corporate retail giants or global financial institutions becoming the latest victims. The truth is, nobody is immune to such a risk – including the government. In fact, given the wealth of highly classified information and sensitive data these infrastructures contain, the threat of such a security breach can have much more dire implications. The good news is, automated cyber security incident response offers a real, actionable and effective solution to incident management.

The Best Defense is a Good Offense

The reason cyber-threats are so ominous is the fact that those behind them are becoming savvier by the day. As a result, the tools used to combat these risks and manage incoming incidents effectively must be equally sophisticated and ever-evolving. Not only does having an army of human workers handling this daunting task leave room for costly errors, but it’s also something that most government organizations simply cannot afford.

Automated cyber security incident response, on the other hand, ensures that any and all incoming alerts are identified, analyzed, prioritized and addressed in the most timely and effective manner. As an added bonus, this can be done with only the bare minimum in terms of personnel. Essentially, automated incident response provides the ability to do more with less while also achieving a greater level of protection against dangerous security risks.

Immediate Response is Critical

With the very security of a nation potentially at risk, the timeliness of incident response is absolutely critical to government agencies. While a breach can happen in an instant, the after-effect can take months to overcome and cost an enormous amount of money. To avoid this, those in charge of security must invest in the appropriate tools which will ensure that any incident that occurs is immediately detected and expeditiously dealt with.

This is another area where automated incident response is highly effective. Removing the human element of incident management not only speeds up the process, but it all but eliminates the possibility of an alert being overlooked and allowed to wreak havoc. Oftentimes just identifying the appropriate party to handle a cyber-attack can be a costly and impactful waste of time. The right automation tool will ensure that alert notifications and escalations are handled properly.

Streamlining Systems for Maximum Performance

Many of those in charge of security at a government level have been hesitant about adopting automated incident response for fear of going over an already tight budget. What isn’t being taken into account, however, is how versatile and agile a quality automation platform can be. In fact, such a tool can easily be implemented with little to no interruption, and without the need to replace existing systems. Rather, the right product will seamlessly integrate with legacy systems to further enhance the incident response process. Most importantly, doing so is much less expensive than one may imagine. Likewise, the added level of protection is well worth any initial investment, with far-reaching benefits for many years to come.

A Proactive Approach for the Future

In addition to real-time incident management, automation also provides forward-thinking government agencies the ability to project and prepare for future problems before they occur. Identifying and outlining best practices and being proactive about cyber-threats can vastly decrease the odds of a breach occurring, thereby enhancing security from the forefront.

These days, nobody is safe from the dangers of cyber-security threats. Government agencies are at just as great a risk, if not more so, than other organizations, and therefore must take the appropriate measures to protect sensitive data. With an automated cyber security incident response plan in place, the dangers can be greatly reduced and any potential damages mitigated while also reducing costs and improving operational efficiency. It’s a win-win.

To learn more about automated cyber security incident response and how it can better protect government infrastructures, click here or download your free trial today.





eBook: 5 Reasons You Should Automate Cyber Security Incident Response




Minimizing Mean Time to Resolution (MTTR) with IT Process Automation

Any seasoned IT professional will tell you that one of the biggest challenges they face in their day to day job is reducing mean time to resolution (MTTR), or the amount of time it takes to get key systems back up and running after an incident. Down time in any industry can have a significant impact on both internal operations and external service levels. And the longer it takes to get things resolved, the worse the problems can become. IT process automation can make minimizing MTTR even easier and more effective.

Managing mean time to resolution involves 4 main steps:

  • Identifying the problem
  • Uncovering the root cause of the problem
  • Correcting the problem
  • Testing to verify that the problem as successfully been resolved

How quickly you can achieve the first step will ultimately depend on the quality of the monitoring system you have in place. Having a basic system can only get you so far, but leaves a lot of room for costly error. Depending on how many incoming alerts your organization fields, staying on top of them can be too much for a small IT department. That means serious issues could slip through the cracks and cause major problems down the road. Enhancing your system with IT Process Automation can create a highly effective, closed-loop solution, ensuring that all critical incidents requiring attention are received and prioritized accordingly.

Once an incident is identified, the next step is determining its root cause. This is the costliest part of the MTTR equation because it takes time, resources and manpower. Obviously, the more serious the issue, the more quickly it needs to be addressed. This may require “all hands on deck” to help uncover the cause so it can be corrected. It’s also important that there is visibility and accountability at all times throughout the process. Who is handling the problem? What steps have been taken so far to get to the bottom of it? Has anything been missed? Again, automation can offer this by providing real-time status of incidents, ownership, severity and priority in one central dashboard.

As soon as the problem has been properly diagnosed, the third step is taking the necessary actions to resolve it as quickly and effectively as possible. With most incidents, time is of the essence, so developing a solution is critical. One of the biggest benefits of integrating automation into your incident management process is that it can actually predict Mean Time to Resolution based on historic events. This can provide a guideline for the resolution process and alleviate some of the stress that naturally arises during a downtime. The IT team will be able to work quickly and efficiently to implement a solution that will get systems back up and running fast, limiting the negative effects on the company.

The final step in the MTTR process is testing to ensure that the problem is, indeed, resolved. It’s also important to assess each process to identify areas that can be improved. Being proactive can help to understand the best way to deal with similar incidents and can even help to avoid them completely.

In conclusion, managing the mean time to resolution process involves careful monitoring and the right tools, specifically IT process automation. This can provide the most timely and effective response and a faster overall turnaround, thereby reducing or even eliminating impact on the business. If your current incident response system isn’t producing these results or you’d like to learn more about how ITPA can dramatically reduce your MTTR, call us today at 1-800-652-5601 or download a free 30 day trial.




How to Get Critical Systems Back Online in Minutes




Why Incident Management Should be The Next IT Process You Automate

Incident Management

We’re always hearing about how IT automation can revolution certain work functions, namely complex IT processes and workflows. What we don’t hear too often is how this powerful tool can also be used to streamline other important business functions. One of the best uses for automation is IT operations management, also known as incident management. If this happens to be on your list of tasks, here are some compelling reasons why you should consider implementing an automation tool.

Provides a more proactive approach to managing incidents. When you automate your event management function, all of your incidents will become more visible much earlier than if handled in the traditional way. This means that potential incidents can be addressed in a timelier manner, often before they have a chance to cause any serious harm. This benefits the entire organization.

Improves response and resolution times. Because IT personnel is able to view and manage incidents in a timelier fashion, the time of response and resolution will also naturally improve. In fact, the right automation product can reduce downtime by up to 90%.

Applies accountability and transparency. With manual event management, it’s much easier for things to slip through the cracks and team members to drop the ball. When you’ve got the right automation tool in place, however, everything from start to finish is visible and transparent. This ends the “blame game” and creates a more cohesive team environment.

Helps to prioritize and manage incoming alerts. Anyone working in event management knows the impact choosing the wrong event to address can have on operations. Automation helps to eliminate this risk by correlating and prioritizing incoming alerts. This allows IT to more effectively allocate resources so that the most critical alerts are handled first.

Reduces number of full-blown incidents. By adding a layer between event management and incident management, you are able to reduce the number of actual incidents that will need to be escalated. The lower the volume of unnecessary incidents allows IT to work smarter, which benefits the company as a whole. According to a presentation by CDW at Knowledge 14, implementing a quality event management solution resulted in a greater than 30% reduction in incident volume.

Opens the door for further IT automation in the future. By streamlining the event management process through automation, additional opportunities will begin to present themselves where automation could provide even more benefit to your organization.

Next steps…

Now that you recognize how automation can revolutionize the way you handle event management within your organization, the next step is determining the current position you are in. Conduct a needs assessment to figure out what your pain points are, and what you’re currently working with in terms of monitoring and event management. Specifically, will you need to integrate the two? This will help you know what to look for in an automation product.




eBook: 5 Reasons You Should Automate Cyber Security Incident Response




IT Process Automation: Moving from Basics to more Advanced and Creative Processes

IT process automation studioWe’ve talked about the importance of starting small with IT process automation, tackling one or two basic repetitive tasks and then building from there. While many businesses are safe taking this route, for others, basic IT process automation is merely a tiny stepping stone that must give way to more complex IT automation strategies. This is especially critical given the increasing challenges IT personnel are facing with big data and mobility becoming more prominent. With that said, let’s take a look at the differences between basic automation, advanced techniques and the more creative, think-outside-the-box strategies.

Basic IT Process Automation Examples

Basic IT automation refers to out-of-the-box solutions that are easy to learn and seamless to implement. Some examples of basic IT automation include:

  • Routine maintenance tasks – Perhaps the easiest and most frequently automated tasks are those that need to be performed on a regular basis but do not require a great deal of input from live personnel. These routine tasks, such as disk cleanups, browser cleanups and even password resets, take up a tremendous amount of time when done manually, so they are ideal for beginner IT automation.
  • Third party application management – Keeping third party applications, such as Adobe or Firefox, up to date and running smoothly is a timely task for IT. With some basic configuration,IT automation can be used to deploy and update these apps as needed.
  • Patch management – IT Automation allows IT to “set and forget” patch approval and reboots for multiple workstations and servers.
  • System Auditing – Stay on top of important every-day alerts, such as low memory, without having to do so manually. Automation tracks and manages incoming alerts, setting into action the necessary workflows to correct problems before they occur.
Advanced IT Process Automation Examples

As IT becomes more comfortable with automation, and the need for more advanced strategies becomes more evident, a robust software product can facilitate any of the following complex workflows:

  • Service desk and incident management – Automated workflows can be developed and deployed to handle the entire alert management process, end to end. Additionally, critical diagnostic information can be gathered and reported so that if and when human intervention is required, the data needed to correct the issue quickly and effectively will be on hand.
  • Monitoring and reporting – Utilize policy-based automation to filter server type and location, and then develop a specific policy that involves continuous system monitoring and enhanced reporting.
  • Self-service for end-users – Empower the end user to manage basic issues that may arise using a self-service end-user portal. To make this process more efficient, creating “how-to” content and provide self-help procedures for users to follow.
  • Automatic application management – Establish a policy to detect any non-compliance for all applications that are on auto startup. Those applications which are flagged as non-compliant can then be automatically removed, if desired, to reduce future security issues and improve performance.
Getting Creative with IT Process Automation

Of course, all of the above scenarios are pretty common and are regularly automated by businesses across the globe. With the right tools and a think-outside-the-box attitude, however, there is almost no limit to what can be accomplished through automation. A few examples of creative automation include:

  • Mail server management – Proactively automate the monitoring of your exchange server and Quality of Service (QoS) by running routine tests to verify that the server is able to send and receive email.
  • Clean up and reduce help desk tickets – Excess “bloatware” bogging your IT group down? IT process automation can help by using an approved configuration to automatically detect deviations and clean up the excess on your behalf.
  • Recovery of stolen mobile devices – You may think it’s hopeless finding that stolen laptop, but maybe not – that is, if you have the right automation tools in place. Leveraging Google location APIs, you can pinpoint the geo location of your equipment. Automating screenshot captures can further bolster your evidence.

Regardless of how you choose to leverage IT Process Automation, there’s no question that with the boom of cloud, mobile and big data looming on the horizon, it’s something that will become a necessary component of business success. Feel free to start small, but don’t limit yourself. Expanding to advanced automation strategies and then dabbling with more creative uses will help you to steamroll head-on into the bright and promising future.




IT Process Automation Survival Guide




Incident Response: A Common Pitfall that Can be Avoided

Incident ResponseThese days, it seems we cannot turn on the news or go online without learning about another major security breach. The most recent and disastrous being those that occurred to a number of popular retailers, like Target and Home Depot. What is the common thread amongst those affected by cyber-attacks? According to investigators, the problem can be linked back to a lack of incident response in nearly every single case.

Yet despite the fact that countless news articles and reports have indicated this as the root problem, many organizations are still not taking proactive measures to protect themselves, their employees and their customers. There are plenty of reasons why, but the main ones seem to be:

They believe their current protection is adequate. Many IT professionals feel that the plan they already have in place is capable of thwarting any would-be attacks. The problem is, most of these existing plans only include preventative measures, such as malware. As the entire world learned from Target’s experience, this isn’t always enough to get the job done. Incident management that involves identifying, verifying, prioritizing and sending appropriate notification of incoming alerts is essential.

They don’t believe it can or will happen to them. Some companies feel that because they are smaller, they aren’t at risk. This is simply not true. Others – such as those in Europe – feel that they aren’t as targeted as businesses in other countries, like the US. The fact is, the only reason more breaches are reported in the US is because the government requires it. There are a similar or equal amount of incidents occurring in countries across the globe.

They don’t understand the real damage an attack can have. Some otherwise intelligent professionals put blinders on when it comes to the subject of cyber-attacks. Sure, retail giants felt a huge impact – as did their customer-base of millions. It’s important to note, however, that smaller organizations, even those who do not have to worry about sensitive client data, have valuable assets that could prove to be disastrous if they fall into the wrong hands. For instance, internal employee information and even trade secrets could be stolen if the company is not properly protected.

For these reasons (and countless others), many businesses fail to recognize the importance and overall value of a quality incident response plan. If you’re reading this and happen to fall into this category, let’s take a closer look at some of the many benefits of developing and implementing an incident response strategy for your business.

  • Reduce downtime. What impact would an entire system shut-down have on your business? One thing is for certain, the longer it takes to bring things back up and running, the worse the consequences will be. By managing incidents more effectively, issues can be responded to immediately, ultimately reducing the amount of downtime your organization will have to face.
  • Improve recovery time. Just as important as bringing systems back up and running is the task of rolling out a recovery plan. It only stands to reason that the more downtime, the more extensive the potential damage. Because quality incident response lets you address issues right away, the time and resources it takes to fully recover are limited.
  • Stay ahead of problems. With the right incident response plan (preferably one that involves IT process automation to field incoming alerts), you can take a more proactive approach to handling potential security breaches. This can mean avoiding any downtime altogether and protecting precious assets in the process.

The key to success, of course, goes well beyond knowing the benefits and even rolling out a plan. It takes ongoing testing to ensure that all pistons are firing on all cylinders at all times. This will further protect your firm from incoming risks and place you one step ahead of the problems that are befalling others all around the world.

With new, more sophisticated cyber-attacks being hatched almost daily, there’s never been a more important time to invest in a quality incident response strategy. It starts with the infrastructure of prevention and IT process automation to ensure a closed-loop process. This will vastly reduce the risks of anything slipping through the cracks (like what happened to Target) and keep your business protected over the long-term.

Don’t wait until your company has become a victim of an online security breach. 





eBook: 5 Reasons You Should Automate Cyber Security Incident Response




Why CIO’s See Automation as Essential for Improving IT Operations Efficiencies and You Should Too!

Why CIO’s See Automation as Essential for Improving IT Operations Efficiencies and You Should Too!More and more CIO’s are leveraging IT automation to improve operational efficiency and subsequently reduce company expenditure across the board.

In today’s still-unstable economic environment, it’s no surprise that businesses in every industry are focusing on cutting costs. Unfortunately, some view IT as a costly investment and an area in which the metaphorical belt can be tightened. What these people don’t realize, and what an increasing number of CIO’s are embracing, is that implementing automation of IT operations can actually result in reduced expenditure overall.

CIO’s that are concentrating on IT as a force of operational automation, integration and control are losing ground to executives who see technology as a business amplifier and a source of innovation. Ongoing advances in technology are now providing forward-thinking CIO’s a much broader spectrum with which to work in terms of cutting costs across the entire organizational platform.

It has nothing to do with cutting IT capability, but rather finding ways to make IT operations more efficient. This is primarily achieved through automation, which significantly reduces the time and resources needed to run routine, repetitive and time-consuming tasks. When these tasks and workflows are automated, IT personnel are freed up to focus on other, more critical matters, thereby improving the overall operations of the department and subsequently the company as a whole.

Another way that CIO’s are leveraging IT automation for the benefit of their entire operation is through improvement of incident management and mean time to resolution (MTTR). Critical system errors are costly and can have a significant impact on an organization’s bottom line. IT automation is allowing businesses to manage incidents and downtime scenarios more efficiently and in a much timelier manner, which means less risk of negative impact, both on the business and on the end user.

IT automation isn’t just becoming a tool for cutting costs. It’s also significantly improving business performance, which plays a key role in increasing revenue. According to a recent survey conducted by Gartner Executive Programs, the main focus of CIO’s in the current climate is growth. They want to attract new customers and effectively retain their current ones. IT automation helps to improve service levels, thereby improving the customer experience.

In a time when budgets are at the forefront of every manager’s mind, from the top down to those on the front line, finding areas to improve service and lower expenditure has become a necessity. IT automation has opened up a number of opportunities for streamlining operations and improving efficiency, which ultimately achieves the goal of reducing costs and boosting enterprise growth. By applying technology as an amplifier to business operations, rather than as simply an individual component, organizations that are embracing IT automation are already reaping the benefits and are poised for ongoing success as we move toward the future.

Are you leveraging technology to streamline your business operations? 





eBook: 10 time consuming tasks you should automate




When it Comes to IT Security, Incident Response is Key

Incident Response

When it Comes to IT Security, Incident Response is Key

As many well-known organizations learned the hard way this year, security breaches can not only impact the bottom line, but they can severely damage your reputation. If people feel they cannot trust a retailer like Target or Home Depot without risk of their financial information being compromised, they simply won’t do business with them. It’s enough to put even the most successful company on the road to ruin. The problem is, security breaches like this happen on a much smaller scale by the millions each and every year.

Organizations of every shape, size and industry are vulnerable to hackers and would-be online thieves who prey on any opportunity they can get their hands on. So, how can businesses protect themselves from such a disaster? The answer lies in quality incident response.

What many companies mistakenly do is place all their trust in detection, like malware. But, as the entire world learned following the Target debacle, this strategy isn’t always fool proof. In fact, if you’re not handling incoming incidents the right way, you could be placing your business in the same position as the others that have traveled down this dangerous and costly path.

Simply put, when it comes to maintaining the integrity of your sensitive data, prevention is always the best approach. Of course, there is no way to achieve 100% protection. You can come close, however, by designing a complementary incident management strategy that marries prevention with sound IT security practices. This ensures that in those instances when attacks manage to slip through the security measures that are in place, the incident response process will serve as a second line of defense.

Tips for Setting Up Your Own Incident Response Team
  • Choose the right personnel. This can include employees from within the organization who are at different levels and possess various skillsets. Generally speaking, most incident response teams are made up of workers with the following credentials:

o   System Administrators
o   Network Administrators
o   IT Managers
o   Software Developers
o   Auditors
o   Security Architects
o   Disaster Recovery Specialists
o   Chief Technology Officers (CTOs)
Maintain accurate logs of applications, networks and operating systems. These should be checked daily by network administrators to ensure that all software is logging properly. Use of log analysis programs is also recommended

  • Logs should be automatically backed up and stored not only locally, but also externally. This is essential to proper recording and analysis
  • Ensure that all incidents are documented, both for auditing and compliance purposes as well as for future enhancements to IT best practices
  • Use quality software products that can improve the process and visibility of incident ownership
  • Incorporate  IT automation into the alert management process to improve prioritization, delivery and escalation of critical incidents
  • Establish a balance between reactive services (incident management and documentation) and proactive services (security audits, intrusion detection system maintenance, security strategy development, pre-incident analysis)
  • Set and implement schedules for all proactive service activities
  • Enlist a third party to conduct penetration tests at least once a year
Additionally, the team tasked with handling incident response should be made up of the following subsets:
  • Team Lead – member in charge of all incident management activities
  • Incident Lead – member who reports directly to the Team Lead and coordinates all incident responses
  • IT Contact – coordinates communications between the Incident Response Team and IT Department
  • Legal Representative – member possessing experience in IT security policy and incident response tasked with mitigating risk of litigation
  • Public Relations Officer – handles all communications regarding security incidents

Given the fact that cyber risks are at an all-time high, and with criminals learning newer, more sophisticated ways to hack, there has never been a more critical time for businesses to employ proper security measures. The most effective way to do so is by developing and implementing a quality incident response strategy. The tips highlighted above should provide a good foundation and help establish your organization in a much more secure position moving forward.





eBook: 5 Reasons You Should Automate Cyber Security Incident Response