Event Correlation
Encyclopedia
Event correlation is a technique for making sense of a large number of events and pinpointing the few events that are really important in that mass of information.

History

Event correlation has been used in telecommunications and industrial process control
Industrial process
Industrial processes are procedures involving chemical or mechanical steps to aid in the manufacture of an item or items, usually carried out on a very large scale. Industrial processes are the key components of heavy industry....

 since the 1970s, in network management
Network management
Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance, and provisioning of networked systems....

 and systems management
Systems management
Systems management refers to enterprise-wide administration of distributed systems including computer systems. Systems management is strongly influenced by network management initiatives in telecommunications....

 since the 1980s, in IT service management
IT Service Management
IT service management is a discipline for managing information technology systems, philosophically centered on the customer's perspective of IT's contribution to the business. ITSM stands in deliberate contrast to technology-centered approaches to IT management and business interaction...

 and event-based systems since the 1990s, and in business activity monitoring
Business activity monitoring
Business activity monitoring is software that aids in monitoring of business activities, as those activities are implemented in computer systems....

 (BAM) since the early 2000s.

Event correlation in integrated management

The goal of integrated management is to integrate the management of networks (data, telephone and multimedia), systems (hosts and applications) and IT services in a coherent manner. The scope of this discipline notably includes network management
Network management
Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance, and provisioning of networked systems....

, systems management
Systems management
Systems management refers to enterprise-wide administration of distributed systems including computer systems. Systems management is strongly influenced by network management initiatives in telecommunications....

 and Service-Level Management.

Events and event correlator

Event correlation usually takes place inside one or several management platforms (also known as Network Management Station
Network management station
A Network Management Station is one that executes Network Management Applications that monitor and control network elements such as hosts, gateways and terminal servers. These network elements use a management agent to perform the network management functions requested by the network management...

s or Network Management System
Network management system
A network management system is a combination of hardware and software used to monitor and administer a computer network.Individual network elements in a network are managed by an element management system.-Tasks and operational details:...

s). It is implemented by a piece of software known as the event correlator. This tool is automatically fed with events originating from managed elements, monitoring tools, the Trouble Ticket System, etc. Each event captures something special (from the event source standpoint) that happened in the domain of interest to the event correlator (e.g., the reboot of a device, a Service-Level Objective that is not met for a given customer, or the CPU of an e-business server that is used at 100% for over 15 minutes).

The event correlator plays a key role in the integration of management, for only there do network, system and service events come together. For instance, this is where the failure of a service can be ascribed to a specific failure in the underlying IT infrastructure.

Most event correlators can receive events from trouble ticket systems. However, only some of them are able to notify trouble ticket systems when a problem is solved, which partly explains the difficulty for Service Desk
Service Desk (ITSM)
A Service Desk is a primary IT service called for in IT service management as defined by the Information Technology Infrastructure Library . It is intended to provide a Single Point of Contact to meet the communication needs of both Users and IT employees. But also to satisfy both Customer and IT...

s to keep updated with the latest news. In theory, the integration of management in organizations requires the communication between the event correlator and the trouble ticket system to work both ways.

An event may convey an alarm or report an incident (which explains why event correlation used to be called alarm correlation), but not necessarily. It may also report that a situation goes back to normal, or simply send some information that it deems relevant (e.g., policy P has been updated on device D). The severity of the event is an indication given by the event source to the event destination of the priority that this event should be given while being processed.

Step-by-step decomposition

Event correlation can be decomposed into four steps: event filtering, event aggregation, event masking and root cause analysis. A fifth step (action triggering) is often associated with event correlation and therefore briefly mentioned here.

Event filtering

Event filtering consists in discarding events that are deemed to be irrelevant by the event correlator. For instance, a number of bottom-of-the-range devices are difficult to configure and occasionally send events of no interest to the management platform (e.g., printer P needs A4 paper in tray 1). Another example is the filtering of informational or debugging events by an event correlator that is only interested in availability and faults.

Event aggregation

Event aggregation (also known as event de-duplication) consists in merging duplicates of the same event. Such duplicates may be caused by network instability (e.g., the same event is sent twice by the event source because the first instance was not acknowledged sufficiently quickly, but both instances eventually reach the event destination). Another example is temporal aggregation, when the same event is sent over and over again by the event source until the problem is solved.

Event masking

Event masking (also known as topological masking in network management
Network management
Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance, and provisioning of networked systems....

) consists in ignoring events pertaining to systems that are downstream of a failed system. For example, servers that are downstream of a crashed router will fail availability polling.

Root cause analysis

Root cause analysis
Root cause analysis
Root cause analysis is a class of problem solving methods aimed at identifying the root causes of problems or events.Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes of one...

is the last and most complex step of event correlation. It consists in analyzing dependencies between events, based for instance on a model of the environment and dependency graphs, to detect whether some events can be explained by others. For example, if database D runs on server S and this server gets durably overloaded (CPU used at 100% for a long time), the event “the SLA for database D is no longer fulfilled” can be explained by the event “Server S is durably overloaded”.

Action triggering

At this stage, the event correlator is left with at most a handful of events that need to be acted upon. Strictly speaking, event correlation ends here. However, by language abuse, the event correlators found on the market (e.g., in network management
Network management
Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance, and provisioning of networked systems....

) sometimes also include problem-solving capabilities. For instance, they may trigger corrective actions or further investigations automatically.

Event Correlation in ITIL

The scope of ITIL
Information Technology Infrastructure Library
The Information Technology Infrastructure Library , is a set of good practices for IT service management that focuses on aligning IT services with the needs of business. In its current form , ITIL is published in a series of five core publications, each of which covers an ITSM lifecycle stage...

 (the Information Technology Infrastructure Library) is larger than that of integrated management. However, event correlation in ITIL is quite similar to event correlation in integrated management.

In the ITIL version 2 framework, event correlation spans three processes: Incident Management, Problem Management and Service Level Management.

In the ITIL version 3 framework, event correlation takes place in the Event Management process. The event correlator is called a correlation engine.

See also

  • Business activity monitoring
    Business activity monitoring
    Business activity monitoring is software that aids in monitoring of business activities, as those activities are implemented in computer systems....

  • Complex event processing
    Complex Event Processing
    Complex event processing consists of processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time....

  • ECA rules
    Event condition action
    Event Condition Action is a short-cut for referring to the structure of active rules in event driven architecture and active database systems.Such a rule traditionally consisted of three parts:...

  • Event stream processing
    Event Stream Processing
    Event stream processing, or ESP, is a set of technologies designed to assist the construction of event-driven information systems. ESP technologies include event visualization, event databases, event-driven middleware, and event processing languages, or complex event processing...

  • Event-driven architecture
  • Event-driven programming
    Event-driven programming
    In computer programming, event-driven programming or event-based programming is a programming paradigm in which the flow of the program is determined by events—i.e., sensor outputs or user actions or messages from other programs or threads.Event-driven programming can also be defined as an...

  • Event-driven SOA
    Event-driven SOA
    Event-driven SOA is a form of service-oriented architecture , combining the intelligence and proactiveness of event-driven architecture with the organizational capabilities found in service offerings...

  • Incident management
    Incident Management (ITSM)
    Incident Management is an IT service management process area. The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and...

  • Issue tracking system
    Issue tracking system
    An issue tracking system is a computer software package that manages and maintains lists of issues, as needed by an organization...

  • IT service management
    IT Service Management
    IT service management is a discipline for managing information technology systems, philosophically centered on the customer's perspective of IT's contribution to the business. ITSM stands in deliberate contrast to technology-centered approaches to IT management and business interaction...

  • Network management
    Network management
    Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance, and provisioning of networked systems....

  • Problem management
  • Root cause analysis
    Root cause analysis
    Root cause analysis is a class of problem solving methods aimed at identifying the root causes of problems or events.Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes of one...

  • Supervisory control and data acquisition (SCADA)
  • Systems management
    Systems management
    Systems management refers to enterprise-wide administration of distributed systems including computer systems. Systems management is strongly influenced by network management initiatives in telecommunications....


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK