ITIL Intermediate OSA - Incident Management Tutorial

Welcome to lesson 3 ‘Incident Management’ of the ITIL Intermediate OSA Tutorial, which is a part of the ITIL Intermediate OSA Certification Course. In this lesson, we will learn about Incident Management.

Let us begin with the objectives of this lesson.

Objectives

By the end of this ‘Incident Management’ lesson, you will be able to:

  • Discuss the objectives, scope, activities, key concepts, challenges, risks, activities, interfaces, and success factors of the Incident Management.

  • Understand the process flow, escalations, categorization, and prioritization of Incident management.

Let us look into the purpose of incident management in the next section.

What is the purpose of Incident Management?

The Purpose of incident management is to restore normal service operation as quickly as possible by minimizing adverse impact on business operations and ensuring agreed levels of service quality is maintained.

The next section talks about the objective of incident management.

Wish to have in-depth knowledge of ITIL Intermediate OSA Course? Check out our Course Preview!

What is the objective of Incident Management?

There are many objectives of incident management, such as:

  • Standardization of methods and procedures which are used for efficient and prompt response, analysis, documentation, ongoing management and reporting of incidents.

  • Incident Management also focuses on increased visibility and communication of incidents to business and IT Support staff

  • Enhancing business perception of IT through use of a professional approach to quickly resolving and communicating incidents when they occur.

  • Objectives also include alignment of incident management activities and priorities with those of the business

  • Maintaining user satisfaction with the quality of IT services.

Next, let’s look into the scope of incident management.

What is the scope of incident management?

Incident Management includes any event which disrupts, or which could disrupt a service. This includes events which are communicated directly by users, either through the Service Desk or through an interface from Event Management to Incident management tools.

Incidents can also be reported and (or) logged by technical staff. This does not mean that all events are Incidents. Many types of events are not related to disruptions at all, but are indicators of normal operation or are simply informational.

In next section, we will learn about incident management as value to the business.

Incident Management - Value to Business

Incident Management provides major value to the business by providing the ability to detect and resolve incidents, which results in lower downtime to the business, which, in turn, means higher availability of the service. The means that the business is able to exploit the functionality of the service as designed.

The next value addition is the ability to align IT activity with real-time business priorities. This is because incident management includes the capability to identify business priorities and dynamically allocate resources as necessary.

The ability to identify potential improvements to services is one of the very important values provided by incident management. This happens as a result of understanding what constitutes an Incident and also from being in contact with the activities of business operational staff.

Let us move on to learn about the policies of incident management.

Policies of Incident Management

Like any other process, incident management also has a set of policies. Policies can be put as rules. The different policies for incident management are stated below:

  • Incidents and their status must be timely and effectively communicated.

  • Incidents must be resolved within timeframes acceptable to the business.

  • Customer Satisfaction must be maintained at all times.

  • Incident processing and handling should be aligned with overall service levels and objectives

  • All Incidents should be stored and managed and should subscribe to a standard classification schema

  • Incident records should be audited on a regular basis

  • All incident records should utilize a common format and a common and agreed set of criteria for prioritization and escalation.

In the next few sections, we will learn about the key concepts of Incident Management. 

Incident Management - Key Concepts

The key concepts of Incident Management are Timescales, Incident Models, Major incident, and Service request. These are discussed below:

Timescales

Timeframes must be agreed for all Incident-handling stages (these will differ depending upon the priority level of the Incident) – based upon the overall Incident response and resolution targets within SLAs – and captured as targets within OLAs and Underpinning Contracts (UCs). All support groups should be made fully aware of these timeframes.

Incident Models

Service Management tools should be used to automate timeframes and escalate the incident as required based on predefined rules. Although a lot of Incidents are not new, they are often dealt in the same way repeatedly.

This is the reason, many organizations will find it helpful to have standard Incident Models and apply them to appropriate incident when they occur. An incident model is a way of predefining the steps that should be taken to handle an incident in an agreed way. Support tools can then be used to manage the required process.

The Incident Model should include the predefined steps that should be taken to handle the Incident and the chronological order of these steps should be considered with any associated dependencies. Roles and responsibilities of those involved should be defined within it.

Timeframes and thresholds for completion of actions and escalation procedures; who should be contacted and when should also be documented. Any necessary extra information that may need to be recorded (particularly relevant for security- and capacity-related Incidents) can also be a part of it.

Major Incident

Major incident is a separate procedure, with shorter timeframes and greater urgency must be used for “major” Incidents. A definition of what constitutes a Major Incident must be agreed and ideally mapped on to the overall Incident prioritization system.

Service Request

Service Request is a request from a user for information, or advice, or for a standard change or for access to an IT service.

For example, to reset a password, or to provide standard IT service for a new user. Service Requests are usually handled by Service Desk and do not require an RFC to be submitted. These types of Service Requests can be performed through standard changes.

Next, let us learn about the process flow of incident management.

Process Flow of Incident Management

The incident management process consists of the following steps:

Step 1. Identification

The incident is detected or reported. This could happen through event management, or the user impacted could register it through a web interface or over a phone call or through email.

Step 2. Registration

When an incident is reported by earlier stated means, the incident is logged and a record is created.

Step 3.

The registered incident is coded by type, status, impact, urgency, SLA, et cetera. This is called incident categorization. At this step, it may be realized that the issue reported is not an incident but a request from the user or it can be changed proposal. If it is a service change request and is then categorized as a service request and handled as per the request fulfillment process.

Otherwise categorized as change proposal and handled by the service portfolio management. For example, a user calls in to report that her email is not working and the service desk person realizes that her email has not been configured, so it’s not an incident but a service request for email configuration.

Step 4.

Once the Incident has been categorized it is assigned an appropriate prioritization code to determine how the incident is to be handled by support tools and support staff. Recall priority is decided on the impact and urgency of the issue.

During this step, identification of Major Incidents also happens, and if found so, the incident is acted upon as per the procedures defined for the major incident.

Step 5.

After prioritization, an initial diagnosis is carried out to try to discover the full symptoms of the incident.

Step 6.

When the service desk cannot resolve the incident itself, the incident is escalated for further support also called functional escalation or if incidents are more serious, the appropriate IT managers must be notified, also called as the hierarchical escalation.

Functional escalation is based on knowledge or expertise and is also known as “Horizontal Escalation.” Whereas Hierarchical escalation is done for corrective actions by authorized line management. It is also known as “Vertical Escalation” and is usually done when the resolution of an incident will not be in time or satisfactory to the end user.

Step 7.

If no escalation is required and if there is no known solution, the incident is investigated. This investigation for a solution could also happen at a functional escalation level.

Step 8.

Once the solution has been found, the solution is applied and the issue can be resolved.

Step 9.

Finally, the service desk should check that the incident is fully resolved, the service has been recovered to a fully functional level and that the user is satisfied with the solution and the incident can be closed. The key thing about incident management is that the Service Desk typically OWNS and is accountable for ALL Incidents. It monitors progress and manages escalation of Incidents.

In the next section, let us talk about Incident Management – Activities.

Incident Management Activities

Action to resolve an incident cannot take place until the incident has been identified. It is not considered as a good practice in most of the organizations where the technology team has to wait for the impacted user to escalate the incident to the Service Desk.

Therefore, all key components need to be monitored so that failures or potential failures are detected early so that the Incident Management process can be started. The quality of the Incident identification will be heavily dependent on the Event Management process.

All Incidents must be fully logged and date/time stamped, regardless of whether they are raised through Service Desk telephone call or whether automatically detected via an event alert. All relevant information relating to the nature of the incident must be logged so that a full historical record is maintained. This will help in referring the incident to other support group(s), who will have all relevant information on hand to assist them.

While logging an incident, one should ensure that the following information is updated:

  • Incident categorization

  • Incident urgency

  • Incident impact

  • Incident prioritization

  • Name/ID of the person and/or group recording the Incident

  • Name/department/phone/location of user

  • Description of symptoms

  • Incident status (active, waiting, closed, etc.)

  • Related CI, support group/person to which the Incident is allocated

  • Related Problem/Known Error

  • Activities undertaken to resolve the incident

  • Closure category

Once the incident is logged, it is essential to categorize them into types of incidents which help in the easy resolution of the event.

Let’s learn about categorization in the next section.

Incident Management - Categorization

To ensure quick response to the incidents recorded, segregating them into different categories is very important. Let us see how it can be done?

The final part of the initial phase is to allocate suitable Incident categorization coding so that the exact type of Incident is recorded.

This will be important later when looking at Incident types/frequencies to establish trends for use in Problem Management, Supplier Management, and other ITSM activities. Multi-level categorization is available in most tools – usually to three or four levels of granularity.

Note: Please note that the check for Service Requests in this step does not imply that Service Requests are Incidents. This is simply the recognition of the fact that Service Requests are sometimes incorrectly logged as Incidents (e.g., a user incorrectly enters the request as an Incident from the web interface). This check will detect any such requests and ensure that they are passed to the Request Fulfillment process.

The following figure is an example of multi-layered categorization.

Post categorization, prioritizing the incident for quick response is very important.

Let’s learn about prioritization in the next section.

Preparing to become an expert in ITIL Intermediate OSA? Why not enroll in our ITIL CSI Course!

Incident Management - Prioritization

To determine how an incident is handled both by support tools and support staff. Prioritization can normally be determined by taking into account both the urgency of the Incident (how quickly the business needs a resolution) and the level of impact it is causing.

In all cases, clear guidance – with a practical example – should be provided for all support staff to enable them to determine the correct urgency and impact levels, so the correct priority is allocated. Such guidance should be produced during service level negotiation.

The table on the section shows the example of priority coding system.

Urgency

Impact

 

High

Medium

Low

High

1

2

3

Medium

2

3

4

Low

3

4

5

Once the incident has been categorized and prioritized, diagnosis of it is done to find a solution.

Let us learn about escalation in the next section.

Incident Management - Escalation

When does an escalation of incident happen?

Incident routing is called horizontal or functional escalation and primarily takes place due to lack of knowledge or expertise. As soon as it becomes clear that the Service Desk is unable to resolve the incident itself, the Incident must be immediately escalated for further support.

If the organization has a second-level support group and the Service Desk believes that the Incident can be resolved by that group, it should refer the Incident to them. If the second-level support group cannot resolve the Incident, it must be escalated to the third-level support group.

This could be internal or an external third party. The rules for escalation and handling of Incident must be agreed in OLAs and UCs with internal and external groups respectively. When referring Incidents, care should be taken by Service Desk to ensure that SLA resolution times are not exceeded.

The two types of escalation are discussed below:

Vertical or hierarchical escalation:

Vertical or hierarchical escalation can take place at any moment during the Incident Lifecycle. It usually occurs when major Incidents are reported or when it becomes apparent that an Incident will not be resolved in time, which results in breached SLAs.

This allows the relevant authority to take corrective action. Escalation never turns an Incident into a problem, although it may result in ownership of an Incident passing to the Problem Manager for administrative reasons and/or the identification of an associated Problem.

The Service Desk owes the Incident throughout its lifecycle, regardless of where it has been escalated! The Service Desk is responsible for tracking progress, keeping users informed and ultimately for Incident closure.

Functional escalation:

When the Service Desk can’t resolve an incident itself (or when first level resolution target times are to be breached) the incident must be escalated for further support.

This can be to a second-line support group. If the incident requires deeper technical knowledge, it can be escalated to a third-line group. This could be an internal department or a third party such as a software supplier or hardware manufacturer.

The following picture shows the flow of hierarchical escalation and functional escalation.

In this section we have learned two types of escalations, moving on let us learn about resolution and recovery process of incidents.

Incident Management - Resolution and Recovery

Incident Management – Resolution and Recovery Resolution is an important part of incident management. This is the step where an incident can be resolved or a resolution can be identified. When a potential resolution has been identified, it should be applied and tested.

The specific actions to be undertaken and the people who will be involved in taking the recovery actions may vary, depending on the nature of the fault. Even when a resolution has been found, sufficient testing must be performed to ensure that the recovery action is completed and that the service has been fully restored. The resolving group should pass the Incident back to the Service Desk for closure action.

Let’s look into the closure step of incident management.

Incident Management - Closure

Let us look at the steps involved in closure.

The Service Desk should check that the Incident is fully resolved and that the users are satisfied and willing to agree that the Incident can be closed.

The Service Desk should also check the following:

Closure categorization:

Check and confirm that the initial Incident categorization was correct or, where the categorization subsequently turned out to be incorrect, update the record so that a correct closure categorization is recorded for the Incident-seeking advice or guidance from the resolving group(s) as necessary.

User satisfaction survey:

Carry out a user satisfaction call-back or e-mail survey for the agreed percentage of Incidents.

Incident Documentation:

Chase any outstanding details ensure that the incident record is fully documented so that a full historical record at a sufficient level of details is complete.

Ongoing or recurring Problem:

Determine (in conjunction with resolution groups) whether it is likely that the Incident could recur and decide whether any preventive action is necessary to avoid this. In conjunction with problem management, raise a problem record in all such cases so that preventive action is initiated.

Formal closure:

Formally close the Incident Record. Even with a mature Incident process well managed, there will be occasions when Incidents recur even though they have been formally closed. Because of such cases, it is wise to have predefined rules about if and when an Incident can be reopened.

We have looked at all phases of incident management so far. However, you might question what happens when there is a recurrence of incidents? The answer to this question is available in the next section.

Rules for reopening incidents

What happens when the incidents are recurring?

Despite all adequate care, there will be occasions when incidents recur even though they have been formally closed. The choice made must consider its effect on data collection, so the reoccurrence and associated work is clearly recorded and accurately reported. Because of such cases, it is wise to have predefined rules about if and when an incident can be reopened.

It might make sense, for example, to agree that if the incident recurs within one working day then it can be reopened – but that beyond this point a new incident must be raised, but linked to the previous incident/s. The exact time/thresholds may vary between individual organizations but clear rules should be agreed and documented.

Like event management, incidents also have their own trigger points. Let us look at the details.

Incident Management - Triggers

Incidents can be triggered in many ways. The most common route is when a user rings the Service Desk or completes a web-based incident-logging screen, but increasingly incidents are raised automatically via Event Management tools.

Technical staff may notice potential failures and raise an incident or ask the service desk to do so that the fault can be addressed. Some incidents may also arise at the initiation of suppliers – who may send some form of notification of a potential or actual difficulty that needs attention.

In the next section, we will learn about the inputs and outputs of incident management.

Inputs and Outputs of Incident Management

As we all know every process has its own set of inputs as well as outputs. Here we will go through the different inputs and the outputs of the process.

Let us start with the inputs of incident management.

Inputs for the incident management could be:

  • Information about CIs and their status

  • Information about known errors and their workarounds

  • Communication and feedback about incidents and their symptoms

  • Communication and feedback about RFCs and releases that have been implemented or planned for implementation

  • Communication of events that were triggered by event management

  • Operational and Service Level Objectives

  • Customers Feedback on the success of incident resolution activities and overall quality if incident management activities

  • Agreed criteria for prioritizing and escalating incidents

Outputs for the process could be:

  • Resolved Incidents and actions taken to achieve their resolution

  • Updated Incident Management Records with accurate incident detail and history

  • Updated classification of incidents to be used to support proactive problem management activities

  • Raising of problem records for incidents where an underlying cause has not been identified

  • Validation that incidents have not recurred for problems that have been resolved

  • Feedbacks on incidents related to changes and releases

  • Identification of CIs associated with or impacted by incidents

  • Satisfaction feedback from customers who have experienced incidents

  • Feedback on level and quality of monitoring technologies and event management activities

  • Communication about incidents and resolution history detail to assist with identification of overall service quality.

In the next section, we will be learning the interfaces of incident management.

Incident Management - Interfaces

Let us start to understand interfaces of incident management with other management such as:

Problem Management

Incident management forms part of the overall process of dealing with Problem. Incidents are often caused by underlying Problems, which must be solved to prevent the Incident from recurring. Incident Management provides a point where these are reported.

Configuration Management

Configuration management provides the data used to identify and progress Incidents. The CMS contains information about which categories of Incident should be assigned to which support group. In turn, incident management can maintain the status of faulty Cls.

It can also assist configuration management to audit the infrastructure when working to resolve an Incident. And that’s the relationship configuration management shares with incident management.

Change Management

Whenever there is a change required to implement a workaround or resolution, this will need to be logged as an RFC and progressed through change management. In turn, incident management is able to detect and resolve Incidents that arise from failed changes.

Availability Management

Availability Management will use incident management data to determine the availability of IT services and look at improvements.

Service Level Management

There is a specific relationship between service level management and incident management. The ability to resolve Incidents in a specified time is a key part of delivering an agreed level of service. Incident Management enables SLM to define measurable responses to service disruptions.

It also provides reports that enable SLM to review SLAs objectively and regularly. 

Incident Management - Metrics and Information Management

Based on the goals of the target audience (operation, tactical, or strategic) the service owners need to define what they should measure in a perfect world.

To do this they must:

Map the activities of the process that need to be measured.

Consider what measurements would indicate that each service and Service management activity is being performed consistently and can determine the health of the process.

Identify the measurements that can be provided based on existing toolsets, organizational culture, and process maturity.

Note: There may be a gap in what can be measured vs. what should be measured. When implementing, initially processes don’t try to measure everything, rather be selective about what measures will help to understand the health of a process. A major mistake many organizations make is trying to do too much in the beginning. Be smart about what you choose to measure.

Incident Management - Information Management

IT must now be able to measure and report against an end-to-end service. This information will be important in feeding CSI enabling it to answer any business questions.

Therefore for information management one should ensure to maintain:

  • Incident Management Tools which includes resolution actions and history

  • Incident Record Data, which includes incident classification, details of any action taken, incident category, impact, urgency, priority

  • Relationship with other incidents, problems, changes or known errors.

So far we have studied about metrics and information management, let us proceed to see what the challenges of incident management are?

Incident Management - Challenges

There can be multiple challenges while implementing incident management such as:

  • The ability to detect Incidents as early as possible will require education of the users reporting Incidents and the configuration of Event Management tools which can be considered as a huge challenge for the organization.

  • Convincing all staff (technical teams as well as users) that all incidents must be logged, and encouraging the use of self-help web-based capabilities itself is a major challenge faced by incident management

  • Availability of information about problems and known errors will enable the incident management staff to learn from previous Incidents and also to track the status of resolutions.

  • Integration of CMS to determine the relationship between CIs and to refer history of CIs while performing first-line support.

  • Alignment with the SLM process will assist Incident Management correctly to assess the impact and priority of Incidents and assists in defining and executing escalation procedures.

We have looked at the challenges of Incident Management, let us now learn about the Critical Success Factors and KPIs.

Incident Management - CSFs and KPIs

The following list includes some sample CSFs for Incident Management. Each organization should identify appropriate CSFs based on its objectives for the process. Each sample CSF is followed by a small number if typical KPIs that support the CSF.

These KPIs should not be adopted without careful consideration. Each organization should develop KPIs that are appropriate for its level of maturity, its CSFs and its particular circumstances.

Achievement against KPIs should be monitored and used to identify opportunities for improvement, which should be logged in the continual service improvement (CSI) register for evaluation and possible implementation.

The following table depicts the CSFs and their corresponding KPIs:

CSF

KPI

Resolve incidents as quickly as possible minimizing impacts on the business.

  • Mean elapsed time to achieve incident resolution or circumvention, broken down by impact code

  • Breakdown of incidents at each stage (e.g. logged, work in progress, closed etc.),

  • Percentage of incidents closed by the service desk without reference to other levels of support

  • Number and Percentage of incidents resolved remotely, without the need for a visit and lastly

  • Number of incidents resolved without impact to the business

Maintain quality of IT services

  • Total number of incidents (as a control measure)

  • Size of current incident backlog for each IT service

  • Number and percentage of major incidents for each IT Service

Maintain user satisfaction with IT Services

  • Average user/ customer survey score

  • Percentage of satisfaction surveys answered versus the total number of satisfaction surveys sent

The next section talks about the risks associated with Incident Management.

Incident Management - Risks

There can be a number of risks, which can be associated with incident management. These risks are given below:

  • Being overambitious can be a risk. That’s why never try to improve everything at once. Be realistic with timelines and expectations.

  • Not discussing improvement opportunities with the business can be a risk as the business has to be involved in improvement decisions that will impact them.

  • There should be a balanced focus on both services as a whole and incident management.

  • Improvement projects should be prioritized. Not prioritizing improvement projects itself, can be a risk.

  • Lack of making strategic, tactical or operational decisions based on knowledge gained – reports are actually used; people see that the reports are being used.

  • Lack of management taking action on recommended service improvement opportunities

  • Lack of meeting with the business to understand new business requirements

  • The communication/awareness campaign for any improvement is lacking, late or missing altogether

  • Not involving the right people at all levels to plan, build, test and implement the improvement.

Summary

Under Incident Management, we have learned about the incident management purpose, objective, scope, key concepts, categorization, prioritization, escalations, resolution and recovery, closure, challenges, risks, metrics and information management.

In the next lesson, we will be covering topics on Request Fulfillment.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*