IT Operations in the Age of Coronavirus

Coronavirus has been a shock to the system for many IT organizations who are traditionally accustomed to working together in person. When you’re in an office, you can often use informal methods of communication – like swinging by someone’s desk, calling them on their office extension, or even imparting critical information when you run into them in the company cafeteria. And when urgent incidents requiring a real-time response occur, you often have a live network operations center (NOC) you can call into, which is staffed 24/7 with personnel ready to respond to incidents, and to corral necessary people and dial the (few) people who are remote into a phone bridge.

Obviously, that which was possible weeks ago is no longer possible now. The worldwide and sudden mandates from companies and health authorities to make work fully remote have upended all of these processes. What IT organizations need to do today is twofold: automate communication and incident response processes and automate IT tasks.

Automate Communication and Incident Response Processes

IT operations conducted in-person can often mean that operational processes are ad-hoc, with poorly defined chains of communication. In some sense, it’s why NOCs and their phone bridges or war rooms exist: it’s a way to physically assemble people to deal with emergent or unpredictable situations. Without a way to do this, it’s time to invest in establishing standard, predictable workflows that can handle any kind of urgent, real-time operational incident, no matter where your IT staff are. This is especially critical if you’re in one of the verticals like online education or video collaboration services that’s being highly impacted by the current crisis.

PagerDuty has over ten years of experience helping customers to establish consistent, predictable incident response processes, and you can benefit from our knowledge by using resources like our Incident Response Guide.

Automate Daily IT Tasks and Remediate Alerts

Incident response processes generally require some action to be taken on systems or applications in order to resolve that incident. Again, when teams are physically co-located with one another, it’s easy for IT professionals to simply log into systems and perform manual activities such as typing commands and running scripts, and reporting the results of those activities by voice to those team members assembled in a war room or on a conference bridge.

Once teams are remote, this level of ad-hoc task execution will be difficult to perform safely. In some situations, such as with offshore managed service providers, or highly secure environments, employees may not even be permitted to work remotely – so automation of IT tasks is even more critical, to allow incidents to kick off auto-remediation actions, for example. It’s time to define standard automation recipes to achieve common tasks, reducing errors and improving knowledge sharing in a world where IT professionals don’t sit next to each other.

PagerDuty and Ayehu: A Joint Solution for Incident Response and IT Task Automation

PagerDuty and Ayehu, a leading provider of automated IT incident remediation, have teamed up to create a joint solution for IT automation in the context of incident response. You can combine PagerDuty’s six free licenses of PagerDuty Starter (use the code “COVID-19” when signing up) with Ayehu’s five free workflows package. You can connect the two using either custom incident actions from PagerDuty, to initiate Ayehu workflows from a PagerDuty incident, or even incorporate those workflows within a PagerDuty automated response play.

To learn more about how PagerDuty and Ayehu are working together to help you rapidly re-engineer IT processes and improve communications between IT teams during major incidents, please click here.