An Introduction to the Identification and Evaluation of IT Operational Risk

  • Print

This first article introduces a planned series of articles. It also discusses the impact of change on risk levels with some examples. The objective of this series of articles is to introduce broad categories of IT Operational Risk. After the first introductory article the series will select risk categories, or specific risks, or processes and techniques which identify, report or manage identified risk.

The Impact of Change One of the greatest generators of risk is the implementation of changes to the IT environment. Larger IT organisations can implement many thousands of changes every year. Some changes have greater elements of risk than other changes. The impact of reducing the levels of change can have a significant impact on reducing the number of significant IT incidents.

In one financial institution the two week Christmas break when levels of change drop significantly, although minor changes continue to be implemented, a two thirds drop in major incidents were seen, although both business volumes and batch production jobs remained the same or even increased slightly due to the season and the execution of year workloads.

Categories of Operational Risk

There are many taxonomies for describing risk, most tend to concentrate on Information Security, or complicated hierarchical structures. This article considers the category of Information Security to be only one risk type. The actual number of categories is not really that significant other than ensuring appropriate risk review coverage has occurred. The purpose here is to provide a range of operational risk. For each risk category, an example risk has been defined, and what happened when the risk occurred.

Operational Risk Assessment

The IT Operational Risk Assessment (ORA) process deals with the identification, assessment, reporting, re-mediation, and closure of risks associated with the introduction of new or changed IT components. The purpose of the process to raise awareness of potential risk and allow management to make appropriate decisions regarding proceeding with the proposed change.

Risks are inherent in the business of operating IT systems as there are risks associated with everything we do. IT operational risks, are often hidden from view, unlike crossing the street where approaching traffic can be seen and the risk more easily assessed. Secondly, while the risk may be defined the impact of the risk on ongoing operations may be less well understood or not defined at all.

ORA is designed to identify and describe risks that may be forgotten, not assessed, or hidden from the view of the project team. When each risk is identified, it is described, along with a risk rating as to both likelihood of occurrence the potential impact if an event were to occur.

Unlike a number of Information Security reviews which concentrate on extensive Threat-Risk Assessment (TRA) processes and documentation which tend to be lengthy to create and get approved, the ORA process would be considered an ’Agile’ process. Preparation, assessment to publishing results takes only a few days. The timetable for implementing changes in most organisations does not allow for a delay of weeks while a heavy weight process assesses residual risk in a proposed implementation. The process must suit the overall risk profile of the project, application or service, but the general principles of ORA can be adapted to that overall risk profile.

The ORA Process

1.   Identify the projects or changes that are considered important, high risk or prone to failure. This activity should be done early in the project as part of the project planning.

2.   Contact the project team and identify that the project is subject to ORA. For the project explain what needs to be done to accommodate ORA.

3.   Have the project team complete check-lists or questionnaire’s at appropriate times through the project.

5.   The ORA team will review the questionnaires, interview staff, read appropriate documentation, meet with key personnel and formulate risks.

6.   ORA team write a report to the project team describing their findings, and any mitigating factors found.

7.   ORA team report their findings to the executives via the executive sponsor including comments from the team on the risks found.

8.   ORA team transfer residual risks to a problem tracking system if the change goes ahead and change is implemented with temporary mitigations or open risks.

The Impact of Change

The single biggest cause of failure in industrial, commercial and financial applications[1] and associated IT infrastructure is change. Changes cause incidents which must be managed, and require associated problem management activity to fix the underlying problem(s). Operational Risk Management identifies risks which may become incidents and provides an opportunity to fix problems prior to their occurrence. ORA also allows the appropriate level of risk to be identified, and an appropriate assessment of whether that level of risk is acceptable to the business.

Changes to IT systems and applications are inevitable and one can argue that the ITIL Change Process is the correct place to manage such changes. The ITIL Change Process deals mainly with the mechanics of moving a change through various environments but spends less time evaluating the impact of change and no time discussing associated risk. Change Advisory Boards (CABs) are a requisite component of the ITIL process, but those boards in general neither have the time or skills to correctly assess risks unless they can be presented in a concise and easily understood form.

If an ORA is available prior to final approval to proceed with the change is available, greater certainty as to risk and benefit can be applied to decision making. This assessment can be used by executives, change management personnel and operational teams a a basis for both agreement to proceed, monitoring posture post implementation, and response to incidents post the change. Note that ORA is NOT a process for stopping or slowing down a change, it only provides a lens for exposure to known residual risk associated with that change.

In the future articles we will address the question ’how can we build an ORA process?’, what executive sponsorship and support is required to implement the process, and categories of operational risk.



[1] For certain other types of system such as medical, life safety and perhaps military applications change may not be the primary cause of failure.

 

 

John Comer is a pseudonym for an active IT professional working in the UK. He is an IT Operational Risk Manager of a large financial institution.