Alarm management
Encyclopedia
Alarm management is the application of human factors
Human factors
Human factors science or human factors technologies is a multidisciplinary field incorporating contributions from psychology, engineering, industrial design, statistics, operations research and anthropometry...

 (or ergonomics
Ergonomics
Ergonomics is the study of designing equipment and devices that fit the human body, its movements, and its cognitive abilities.The International Ergonomics Association defines ergonomics as follows:...

 as the field is referred to outside the U.S.) along with instrumentation engineering and systems thinking
Systems thinking
Systems thinking is the process of understanding how things influence one another within a whole. In nature, systems thinking examples include ecosystems in which various elements such as air, water, movement, plants, and animals work together to survive or perish...

 to manage
Management
Management in all business and organizational activities is the act of getting people together to accomplish desired goals and objectives using available resources efficiently and effectively...

 the design
Design
Design as a noun informally refers to a plan or convention for the construction of an object or a system while “to design” refers to making this plan...

 of an alarm system to increase its usability
Usability
Usability is the ease of use and learnability of a human-made object. The object of use can be a software application, website, book, tool, machine, process, or anything a human interacts with. A usability study may be conducted as a primary job function by a usability analyst or as a secondary job...

. Most often the major usability problem is that there are too many alarms annunciated in a plant upset, commonly referred to as alarm flood, since it is so similar to a flood caused by excessive rainfall input with a basically fixed drainage
Drainage
Drainage is the natural or artificial removal of surface and sub-surface water from an area. Many agricultural soils need drainage to improve production or to manage water supplies.-Early history:...

 output capacity. However, there can also be other problems with an alarm system such as poorly designed alarms, improperly set alarm points, ineffective annunciation, unclear alarm messages, etc.

Alarm problem history

From their conception, large chemical, refining, power generation, and other processing plants required the use of a control system to keep the process operating successfully and producing products. Due to the fragility of the components as compared to the process, these control systems often required a control room to protect them from the elements and process conditions. In the early days of control rooms, they utilized what were referred to as "panel boards" which were loaded with control instruments and indicators. These were tied to sensors located in the process streams and on the outside of process equipment. The sensors relayed their information to the control instruments via 4-20 mA current loop in the form of twisted pair wiring. At first these systems merely yielded information, and a well-trained operator was required to make adjustments either by changing flow rates, or altering energy inputs to keep the process within its designed limits.

Alarms were added to alert the operator to a condition that was about to exceed a design limit, or had already exceeded a design limit. Additionally, Emergency Shut Down (ESD) systems were employed to halt a process that was in danger of exceeding either safety, environmental or monetarily acceptable process limits. Alarm were indicated to the operator by annunciator horns, and lights of different colors. (For instance, green lights meant OK, Yellow meant not OK, and Red meant BAD.) Panel boards were usually laid out in a manner that replicated the process flow in the plant. So instrumentation indicating operating units with the plant was grouped together for recognition sake and ease of problem solution. It was a simple matter to look at the entire panel board, and discern whether any section of the plant was running poorly. This was due to both the design of the instruments and the implementation of the alarms associated with the instruments. Instrumentation companies put a lot of effort into the design and individual layout of the instruments they manufactured. To do this they employed behavioral psychology practices which revealed how much information a human being could collect in a quick glance. More complex plants had more complex panel boards, and therefore often more human operators or controllers.

Thus, in the early days of panel board systems, alarms were regulated by both real estate, and cost. In essence, they were limited by the amount of available board space, and the cost of running wiring, and hooking up an annunciator (horn), indicator (light) and switches to flip to acknowledge, and clear a resolved alarm. It was often the case that if you wanted a new alarm, you had to decide which old one to give up.

As technology developed, the control system and control methods were tasked to continue to advance a higher degree of plant automation with each passing year. Highly complex material processing called for highly complex control methodologies. Also, global competition pushed manufacturing operations to increase production while using less energy, and producing less waste. In the days of the panel boards, a special kind of engineer was required to understand a combination of the electronic equipment associated with process measurement and control, the control algorithms necessary to control the process (PID basics), and the actual process that was being utilized to make the products. Around the mid 80's, we entered the digital revolution. Distributed control systems (DCS)
Distributed control system
A distributed control system refers to a control system usually of a manufacturing system, process or any kind of dynamic system, in which the controller elements are not central in location but are distributed throughout the system with each component sub-system controlled by one or more...

 were a boon to the industry. The engineer could now control the process without having to understand the equipment necessary to perform the control functions. Panel boards were no longer required, because all of the information that once came across analog instruments could be digitized, stuffed into a computer and manipulated to achieve the same control actions once performed with amplifiers and potentiometers.

As a side effect, that also meant that alarms were easy and cheap to configure and deploy. You simply typed in a location, a value to alarm on and set it to active. The unintended result was that soon people alarmed everything. Initial installers set an alarm at 80% and 20% of the operating range of any variable just as a habit. One other unfortunate part of the digital revolution was that what once covered several square yards of real estate, now had to be fit into a 17 inch computer monitor. Multiple pages of information was thus employed to replicate the information on the replaced panel board. Alarms were utilized to tell an operator to go look at a page he was not viewing. Alarms were used to tell an operator that a tank was filling. Every mistake made in operations usually resulted in a new alarm. With the implementation of the OSHA 1910 regulations, HAZOPS studies usually requested several new alarms. Alarms were everywhere. Incidents began to accrue as a combination of too much data collided with too little useful information.

Alarm management history

Recognizing that alarms were becoming a problem, industrial control system users banded together and formed the Alarm Management Task Force, which was a customer advisory board led by Honeywell in 1990. The AMTF included participants from chemical, petrochemical, and refining operations. They gathered and wrote a document on the issues associated with alarm management. This group quickly realized that alarm problems were simply a subset of a larger problem, and formed the Abnormal Situation Management
Abnormal Situation Management
The Abnormal Situation Management Consortium The Abnormal Situation Management - ASM® Consortium is a Research and Development Consortium founded in 1994 by Honeywell to address customer concerns about the high cost of incidents at their plants such as unplanned shutdowns, fires, explosions,...

 Consortium (ASM is a registered trademeark of Honeywell). The ASM Consortium developed a research proposal and was granted funding from the National Institute of Standards and Technology (NIST) in 1994.The focus of this work was addressing the complex human-system interaction and factors that influence successful performance for process operators. Automation solutions have often been developed without consideration of the human that needs to interact with the solution. In particular, alarms are intended to improve situation awareness for the control room operator, but a poorly configured alarm system does not achieve this goal.

The ASM Consortium has produced documents on best practices in alarm management, as well as operator situation awareness, operator effectiveness, and other operator-oriented issues. These documents were originally for ASM Consortium members only, but the ASMC has recently offered these documents publicly.

The ASM consortium also parcipated in development of an alarm management guideline published by the Engineering Equipment & Materials Users' Association (EEMUA) in the UK. The ASM Consortium provided data from their member companies, and contributed to the editing of the guideline. The result is EEMUA 191 "Alarm Systems- A Guide to Design, Management and Procurement".

Several institutions and societies are producing standards on alarm management to assist their members in the best practices use of alarms in industrial manufacturing systems. Among them are the ISA (ISA SP-18), API (API 1167) and NAMUR
Namur
Namur may refer to:*Namur in Belgian context:**Namur , a municipality and a city of Belgium, the capital of Wallonia**Namur , a province in Wallonia, Belgium, named after the provincial capital city...

 (Namur NA 102). Several companies also offer software packages to assist users in dealing with alarm management issues. Among them are DCS manufacturing companies, and third-party vendors who offer add-on systems.

Concepts

The fundamental purpose of alarm annunciation is to alert the operator to deviations from normal operating conditions, i.e. abnormal operating situations. The ultimate objective is to prevent, or at least minimize, physical and economic loss through operator intervention in response to the condition that was alarmed. For most digital control system users, losses can result from situations that threaten environmental safety, personnel safety, equipment integrity, economy of operation, and product quality control as well as plant throughput. A key factor in operator response effectiveness is the speed and accuracy with which the operator can identify the alarms that require immediate action.

By default, the assignment of alarm trip points and alarm priorities constitute basic alarm management. Each individual alarm is designed to provide an alert when that process indication deviates from normal. The main problem with basic alarm management is that these features are static. The resultant alarm annunciation does not respond to changes in the mode of operation or the operating conditions.

When a major piece of process equipment like a charge pump, compressor, or fired heater shuts down, many alarms become unnecessary. These alarms are no longer independent exceptions from normal operation. They indicate, in that situation, secondary, non-critical effects and no longer provide the operator with important information. Similarly, during startup or shutdown of a process unit, many alarms are not meaningful. This is often the case because the static alarm conditions conflict with the required operating criteria for startup and shutdown.

In all cases of major equipment failure, startups, and shutdowns, the operator must search alarm annunciation displays and analyze which alarms are significant. This wastes valuable time when the operator needs to make important operating decisions and take swift action. If the resultant flood of alarms becomes too great for the operator to comprehend, then the basic alarm management system has failed as a system that allows the operator to respond quickly and accurately to the alarms that require immediate action. In such cases, the operator has virtually no chance to minimize, let alone prevent, a significant loss.

In short, one needs to extend the objectives of alarm management beyond the basic level. It is not sufficient to utilize multiple priority levels because priority itself is often dynamic. Likewise, alarm disabling based on unit association or suppressing audible annunciation based on priority do not provide dynamic, selective alarm annunciation. The solution must be an alarm management system that can dynamically filter the process alarms based on the current plant operation and conditions so that only the currently significant alarms are annunciated.

The fundamental purpose of dynamic alarm annunciation is to alert the operator to relevant abnormal operating situations. They include situations that have a necessary or possible operator response to insure:
  • Personnel and Environmental Safety,
  • Equipment Integrity,
  • Product Quality Control.

The ultimate objectives are no different than the previous basic alarm annunciation management objectives. Dynamic alarm annunciation management focuses the operator’s attention by eliminating extraneous alarms, providing better recognition of critical problems, and insuring swifter, more accurate operator response.

The need for alarm management

Alarm management is usually necessary in a process manufacturing
Manufacturing
Manufacturing is the use of machines, tools and labor to produce goods for use or sale. The term may refer to a range of human activity, from handicraft to high tech, but is most commonly applied to industrial production, in which raw materials are transformed into finished goods on a large scale...

 environment that is controlled by an operator using a control system, such as a DCS
Distributed control system
A distributed control system refers to a control system usually of a manufacturing system, process or any kind of dynamic system, in which the controller elements are not central in location but are distributed throughout the system with each component sub-system controlled by one or more...

 or a programmable logic controller (PLC)
Programmable logic controller
A programmable logic controller or programmable controller is a digital computer used for automation of electromechanical processes, such as control of machinery on factory assembly lines, amusement rides, or light fixtures. PLCs are used in many industries and machines...

. Such a system may have hundreds of individual alarms that up until very recently have probably been designed with only limited consideration of other alarms in the system. Since humans can only do one thing at a time and can pay attention
Attention
Attention is the cognitive process of paying attention to one aspect of the environment while ignoring others. Attention is one of the most intensely studied topics within psychology and cognitive neuroscience....

 to a limited number of things at a time, there needs to be a way to ensure that alarms are presented at a rate that can be assimilated by a human operator, particularly when the plant is upset or in an unusual condition. Alarms also need to be capable of directing the operator's attention to the most important problem that he or she needs to act upon, using a priority to indicate degree of importance or rank, for instance.

Some improvement methods

The techniques for achieving rate reduction range from the extremely simple ones of reducing nuisance and low value alarms to redesigning the alarm system in a holistic way that considers the relationships among individual alarms.

Nuisance reduction

The first step in a continuous improvement program is often to measure alarm rate, and resolve any chronic problems such as alarms that have no use (often described as one that does not require the operator to take an action). Note that the alarm rate, as measured by the alarms in a journal, is not necessarily the alarm rate seen by operators. This is because a chattering alarm will not update an unacknowledged alarm annunciator (and therefore looks like a single alarm to an operator), but will show up multiple times in a journal.

Design guide

This step involves documenting the methodology or philosophy
Philosophy
Philosophy is the study of general and fundamental problems, such as those connected with existence, knowledge, values, reason, mind, and language. Philosophy is distinguished from other ways of addressing such problems by its critical, generally systematic approach and its reliance on rational...

 of how to design alarms. It can include things such as what to alarm, standard
Standardization
Standardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....

s for alarm annunciation and text messages, how the operator will interact with the alarms, etc.

Documentation and rationalization

This phase is a detailed review of all alarms to document
Documentation
Documentation is a term used in several different ways. Generally, documentation refers to the process of providing evidence.Modules of Documentation are Helpful...

 their design purpose, and to ensure that they are selected and set properly and meet the design criteria. Ideally this stage will result in a reduction of alarms, but doesn't always.

Advanced methods

The above steps will often still fail to prevent an alarm flood in an operational upset, so advanced methods such as alarm suppression under certain circumstances are then necessary. As an example, shutting down a pump
Pump
A pump is a device used to move fluids, such as liquids, gases or slurries.A pump displaces a volume by physical or mechanical action. Pumps fall into three major groups: direct lift, displacement, and gravity pumps...

 will always cause a low flow alarm on the pump outlet flow, so the low flow alarm may be suppressed if the pump was shut down since it adds no value for the operator, because he or she already knows it was caused by the pump being shutdown. This technique can of course get very complicated and requires considerable care in design. In the above case for instance, it can be argued that the low flow alarm does add value as it confirms to the operator that the pump has indeed stopped.

Alarm management becomes more and more necessary as the complexity
Complexity
In general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. The study of these complex linkages is the main goal of complex systems theory. In science there are at this time a number of approaches to characterizing complexity, many of which are...

 and size of manufacturing systems increases. A lot of the need for alarm management also arises because alarms can be configured on a DCS at nearly zero incremental cost, whereas in the past on physical control panel
Control panel (engineering)
A control panel is a flat, often vertical, area where control or monitoring instruments are displayed.They are found in factories to monitor and control machines or production lines and in places such as nuclear power plants, ships, aircraft and mainframe computers...

 systems that consisted of individual pneumatic or electronic
Electronics
Electronics is the branch of science, engineering and technology that deals with electrical circuits involving active electrical components such as vacuum tubes, transistors, diodes and integrated circuits, and associated passive interconnection technologies...

 analog instruments
Measuring instrument
In the physical sciences, quality assurance, and engineering, measurement is the activity of obtaining and comparing physical quantities of real-world objects and events. Established standard objects and events are used as units, and the process of measurement gives a number relating the item...

, each alarm required expenditure and control panel real estate, so more thought usually went into the need for an alarm. Numerous disasters such as Three Mile Island
Three Mile Island accident
The Three Mile Island accident was a core meltdown in Unit 2 of the Three Mile Island Nuclear Generating Station in Dauphin County, Pennsylvania near Harrisburg, United States in 1979....

 and the Chernobyl accident have established a clear need for alarm management.

The seven steps to alarm management

Step 1: Create and adopt an alarm philosophy

a comprehensive design and guideline document that makes it clear “exactly how to do alarms right.”

Step 2: Alarm performance benchmarking

Analyze the alarm system to determine its strengths and deficiencies, and effectively map out a practical solution to improve it.

Step 3: “Bad actor” alarm resolution

From experience, it is known that around half of the entire alarm load usually comes from a relatively few alarms. The methods for making them work properly are documented, and can be applied with minimum effort and maximum performance improvement.

Step 4: Alarm documentation and rationalization (D&R)

A full overhaul of the alarm system to ensure that each alarm complies with the alarm philosophy and the principles of good alarm management.

Step 5: Alarm system audit and enforcement

DCS alarm systems are notoriously easy to change and generally lack proper security. Methods are needed to ensure that the alarm system does not drift from its rationalized state.

Step 6: Real-time alarm management

More advanced alarm management techniques are often needed to ensure that the alarm system properly supports, rather than hinders, the operator in all operating scenarios. These include Alarm Shelving, State-Based Alarming, and Alarm Flood Suppression technologies.

Step 7: Control and maintain alarm system performance

Proper management of change and longer term analysis and KPI monitoring are needed, to ensure that the gains that have been achieved from performing the steps above do not dwindle away over time. Otherwise they will; the principle of “entropy” definitely applies to an alarm system.

See also

  • List of human-computer interaction topics, since most control systems are computer-based
  • Design
    Design
    Design as a noun informally refers to a plan or convention for the construction of an object or a system while “to design” refers to making this plan...

    , especially interaction design
    Interaction design
    In design, human–computer interaction, and software development, interaction design, often abbreviated IxD, is "the practice of designing interactive digital products, environments, systems, and services." Like many other design fields interaction design also has an interest in form but its main...

  • Detection theory
    Detection theory
    Detection theory, or signal detection theory, is a means to quantify the ability to discern between information-bearing energy patterns and random energy patterns that distract from the information Detection theory, or signal detection theory, is a means to quantify the ability to discern between...

  • First-out alarm
    First-out alarm
    A first-out alarm is an alarm that indicates in some manner that it was the first of a series. This is necessary in circumstances such as an automatic trip or shutdown of equipment, where many alarms will annunciate as a result of a shutdown. The first-out alarm will clearly identify the root...

  • Physical security
    Physical security
    Physical security describes measures that are designed to deny access to unauthorized personnel from physically accessing a building, facility, resource, or stored information; and guidance on how to design structures to resist potentially hostile acts...

  • Annunciator panel
    Annunciator panel
    An annunciator panel is a group of lights used as a central indicator of status of equipment or systems in an aircraft, industrial process, building or other installation...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK