Maintaining a 24/7 well-functioning Network Operation Center (NOC) is not an easy task to accomplish. To help you rise to the challenge, we have compiled a list of 5 major issues you may have run into and proposed a way to resolve them.
Extremely busy shifts
The NOC is usually a pretty hectic place: there is always an incident to escalate, a service to restore or a report to produce, and all that while having to keep monitoring other services. However, usually there is not enough manpower to accomplish all of these tasks successfully. To take the load off a busy shift, make a list of all known recurring problems or downtime with a clear procedure for solving. Then use eyeShare (IT Process Automation tool) to build a simple workflow that will solve the problem for you. Take for example a situation of a critical application that crashes twice a week and the solution is to remote connect to a server and restart a service. The NOC operator has to open the procedure every time, check the server name, check the service name and then start connecting to the server. Solving this incident can take any time between 5 to 30 minutes, assuming that the operator noticed the alert right away. If the monitoring system reported directly to eyeShare, the whole resolution process would end in less than a minute, including sending an email to the application’s manager and updating a ticket. By following these suggestions you will accomplish 2 things: a less busy shift and a shorter MTTR.
Daily tasks are highly time consuming
The NOC is responsible to carry out many day to day tasks – reports production, manual monitoring, or preventative tasks such as disk space cleanup and service resets. Naturally, executing all of these tasks is very time consuming and prevents NOC members from getting ahead with other projects that can potentially advance the team. Map out all tasks that have to be executed every shift, daily or weekly, and take the load off of your people by automating them.
Few of the team members lack technical knowledge
Not all NOC members necessarily have the same technical skills and knowledge. Therefore, some people might have more difficulties while handling an incident with a solution that requires advanced troubleshooting skills. The best solution in these cases is to have an expert to solve the problem for you: an expert can identify all troubleshooting steps and all possible options, and create an automated workflow that solves the problem perfectly every time.
Incidents are not solved within the NOC
Many times the NOC is required to escalate incidents to other teams who are more qualified to handle them, or who have the necessary permissions to solve them. Automate such solutions to save the valuable time of a higher tier team, or to avoid contacting on-calls in the middle of the night (which is always an unpleasant task). Another option is to semi-automate the workflow, meaning, that it can communicate with who’s on-call while making important decisions.
Escalation process is unclear or complicated to follow
When getting to the point that an escalation is required, one might get confused from the complexity of the escalation procedure, or from the fact that each service/system has a different procedure. In a busy shift it can be quite difficult to keep track of the time that has passed from the previous step of the escalation, and which steps were already executed, especially if there are several open incidents at the same time. Automating the escalation processes of frequent incidents or top services will prevent the confusion and will assure that your customers get their information correctly and on time – every time.