Safety engineering
Overview
 
Safety engineering is an applied science strongly related to systems engineering
Systems engineering
Systems engineering is an interdisciplinary field of engineering that focuses on how complex engineering projects should be designed and managed over the life cycle of the project. Issues such as logistics, the coordination of different teams, and automatic control of machinery become more...

 / industrial engineering
Industrial engineering
Industrial engineering is a branch of engineering dealing with the optimization of complex processes or systems. It is concerned with the development, improvement, implementation and evaluation of integrated systems of people, money, knowledge, information, equipment, energy, materials, analysis...

 and the subset System Safety
System safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach...

 Engineering. Safety engineering assures that a life-critical system
Life-critical system
A life-critical system or safety-critical system is a system whose failure ormalfunction may result in:* death or serious injury to people, or* loss or severe damage to equipment or* environmental harm....

 behaves as needed even when components fail.
Ideally, safety-engineers take an early design of a system, analyze it to find what faults can occur, and then propose safety requirements in design specifications up front and changes to existing systems to make the system safer.
Encyclopedia
Safety engineering is an applied science strongly related to systems engineering
Systems engineering
Systems engineering is an interdisciplinary field of engineering that focuses on how complex engineering projects should be designed and managed over the life cycle of the project. Issues such as logistics, the coordination of different teams, and automatic control of machinery become more...

 / industrial engineering
Industrial engineering
Industrial engineering is a branch of engineering dealing with the optimization of complex processes or systems. It is concerned with the development, improvement, implementation and evaluation of integrated systems of people, money, knowledge, information, equipment, energy, materials, analysis...

 and the subset System Safety
System safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach...

 Engineering. Safety engineering assures that a life-critical system
Life-critical system
A life-critical system or safety-critical system is a system whose failure ormalfunction may result in:* death or serious injury to people, or* loss or severe damage to equipment or* environmental harm....

 behaves as needed even when components fail.

Overview

Ideally, safety-engineers take an early design of a system, analyze it to find what faults can occur, and then propose safety requirements in design specifications up front and changes to existing systems to make the system safer. In an early design stage, often a fail-safe system can be made acceptably safe with a few sensors and some software to read them. Probabilistic fault-tolerant systems can often be made by using more, but smaller and less-expensive pieces of equipment.

Far too often, rather than actually influencing the design, safety engineer
Safety engineer
-Scope of a Safety Engineer:To perform their professional functions, safety engineering professionals must have education, training and experience in a common body of knowledge. They need to have a fundamental knowledge of physics, chemistry, biology, physiology, statistics, mathematics, computer...

s are assigned to prove that an existing, completed design is safe. If a safety engineer then discovers significant safety problems late in the design process, correcting them can be very expensive. This type of error has the potential to waste large sums of money.

The exception to this conventional approach is the way some large government agencies approach safety engineering from a more proactive and proven process perspective, known as "system safety
System safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach...

". The system safety philosophy is to be applied to complex and critical systems, such as commercial airliners, complex weapon systems, spacecraft, rail and transportation systems, air traffic control system and other complex and safety-critical industrial systems. The proven system safety methods and techniques are to prevent, eliminate and control hazards and risks through designed influences by a collaboration of key engineering disciplines and product teams. Software safety is a fast growing field since modern systems functionality are increasingly being put under control of software. The whole concept of system safety and software safety, as a subset of systems engineering, is to influence safety-critical systems designs by conducting several types of hazard analyses
Hazard analysis
A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of risks. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction and acceptance of risk is determined in the Risk...

 to identify risks and to specify design safety features and procedures to strategically mitigate risk to acceptable levels before the system is certified.

Additionally, failure mitigation can go beyond design recommendations, particularly in the area of maintenance. There is an entire realm of safety and reliability engineering
Reliability engineering
Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

 known as Reliability Centered Maintenance
Reliability Centered Maintenance
Reliability Centered Maintenance, often known as RCM, is a process to ensure that assets continue to do what their users require in their present operating context....

 (RCM), which is a discipline that is a direct result of analyzing potential failures within a system and determining maintenance actions that can mitigate the risk of failure. This methodology is used extensively on aircraft and involves understanding the failure modes of the serviceable replaceable assemblies in addition to the means to detect or predict an impending failure. Every automobile owner is familiar with this concept when they take in their car to have the oil changed or brakes checked. Even filling up one's car with fuel is a simple example of a failure mode (failure due to fuel exhaustion), a means of detection (fuel gauge
Fuel gauge
A fuel gauge is an instrument used to indicate the level of fuel contained in a tank. Commonly used in cars, these may also be used for any tank including underground storage tanks.As used in cars, the gauge consists of two parts:...

), and a maintenance action (filling the car's fuel tank).

For large scale complex systems, hundreds if not thousands of maintenance actions can result from the failure analysis. These maintenance actions are based on conditions (e.g., gauge reading or leaky valve), hard conditions (e.g., a component is known to fail after 100 hrs of operation with 95% certainty), or require inspection to determine the maintenance action (e.g., metal fatigue). The RCM concept then analyzes each individual maintenance item for its risk contribution to safety, mission, operational readiness, or cost to repair if a failure does occur. Then the sum total of all the maintenance actions are bundled into maintenance intervals so that maintenance is not occurring around the clock, but rather, at regular intervals. This bundling process introduces further complexity, as it might stretch some maintenance cycles, thereby increasing risk, but reduce others, thereby potentially reducing risk, with the end result being a comprehensive maintenance schedule, purpose built to reduce operational risk and ensure acceptable levels of operational readiness and availability.

Analysis techniques

Analysis techniques can be split into two categories: qualitative and quantitative methods. The both approaches share the goal of finding causal dependencies between a hazard on system level and failures of individual components. Qualitative approaches focus on the question "What must go wrong, such that a system hazard may occur?", while quantitative methods aim at providing estimations about probabilites, rates and/or severity of consequences.

Traditionally, safety analysis techniques rely solely on skill and expertise of the safety engineer. In the last decade model-based approaches have become prominent. In contrast to traditional methods, model-based techniques try to derive relationships between causes and consequences from some sort of model of the system.

Traditional methods for safety analysis

The two most common fault modeling techniques are called failure mode and effects analysis
Failure mode and effects analysis
A failure modes and effects analysis is a procedure in product development and operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures...

 and fault tree analysis
Fault tree analysis
Fault tree analysis is a top down, deductive failure analysis in which an undesired state of a system is analyzed using boolean logic to combine a series of lower-level events...

. These techniques are just ways of finding problems and of making plans to cope with failures, as in probabilistic risk assessment
Probabilistic risk assessment
Probabilistic risk assessment is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity ....

. One of the earliest complete studies using this technique on a commercial nuclear plant was the WASH-1400
WASH-1400
WASH-1400, 'The Reactor Safety Study, was a report produced in 1975 for the Nuclear Regulatory Commission by a committee of specialists under Professor Norman Rasmussen. It "generated a storm of criticism in the years following its release"...

 study, also known as the Reactor Safety Study or the Rasmussen Report.

Failure modes and effects analysis

Failure Mode and Effects Analysis (FMEA) is a bottom-up, inductive
Inductive reasoning
Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...

 analytical method which may be performed at either the functional or piece-part level. For functional FMEA, failure modes are identified for each function in a system or equipment item, usually with the help of a functional block diagram
Block diagram
Block diagram is a diagram of a system, in which the principal parts or functions are represented by blocks connected by lines, that show the relationships of the blocks....

. For piece-part FMEA, failure modes are identified for each piece-part component (such as a valve, connector, resistor, or diode). The effects of the failure mode are described, and assigned a probability based on the failure rate
Failure rate
Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....

 and failure mode ratio of the function or component.

Failure modes with identical effects can be combined and summarized in a Failure Mode Effects Summary. When combined with criticality analysis, FMEA is known as Failure Mode, Effects, and Criticality Analysis or FMECA, pronounced "fuh-MEE-kuh".

Fault tree analysis

Fault tree analysis (FTA) is a top-down, deductive
Deductive reasoning
Deductive reasoning, also called deductive logic, is reasoning which constructs or evaluates deductive arguments. Deductive arguments are attempts to show that a conclusion necessarily follows from a set of premises or hypothesis...

 analytical method. In FTA, initiating primary events such as component failures, human errors, and external events are traced through Boolean logic
Boolean logic
Boolean algebra is a logical calculus of truth values, developed by George Boole in the 1840s. It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation −x replaced by the respective logical operations of...

 gates to an undesired top event such as an aircraft crash or nuclear reactor core melt. The intent is to identify ways to make top events less probable, and verify that safety goals have been achieved.

Fault trees are a logical inverse of success trees, and may be obtained by applying de Morgan's theorem to success trees (which are directly related to reliability block diagram
Reliability block diagram
A reliability block diagram is a diagrammatic method for showing how component reliability contributes to the success or failure of a complex system. RBD is also known as a dependence diagram ....

s).

FTA may be qualitative or quantative. When failure and event probabilites are unknown, qualitative fault trees may be analyzed for minimal cut sets. For example, if any minimal cut set contains a single base event, then the top event may be caused by a single failure. Quantitative FTA is used to compute top event probability, and usually requires computer software such as CAFTA from the Electric Power Research Institute
Electric Power Research Institute
The Electric Power Research Institute conducts research on issues related to the electric power industry in USA. EPRI is a nonprofit organization funded by the electric utility industry. EPRI is primarily a US based organization, receives international participation...

 or SAPHIRE
SAPHIRE
SAPHIRE is a probabilistic risk and reliability assessment software tool. SAPHIRE stands for Systems Analysis Programs for Hands-on Integrated Reliability Evaluations. The system was developed for the U.S...

 from the Idaho National Laboratory
Idaho National Laboratory
Idaho National Laboratory is an complex located in the high desert of eastern Idaho, between the town of Arco to the west and the cities of Idaho Falls and Blackfoot to the east. It lies within Butte, Bingham, Bonneville and Jefferson counties...

.

Some industries use both fault trees and event tree
Event tree
Error tree is an inductive analytical diagram in which an event is analyzed using Boolean logic to examine a chronological series of subsequent events or consequences...

s. An event tree starts from an undesired initiator (loss of critical supply, component failure etc.) and follows possible further system events through to a series of final consequences. As each new event is considered, a new node on the tree is added with a split of probabilities of taking either branch. The probabilities of a range of "top events" arising from the initial event can then be seen.

Safety certification

Usually a failure in safety-certified
Product certification
Product certification or product qualification is the process of verifying that a certain product has passed performance tests and quality assurance tests or qualification requirements stipulated in contracts, regulations, or specifications...

 systems is acceptable if, on average, less than one life per 109 hours of continuous operation is lost to failure. Most Western nuclear reactors, medical equipment, and commercial aircraft
Aircraft
An aircraft is a vehicle that is able to fly by gaining support from the air, or, in general, the atmosphere of a planet. An aircraft counters the force of gravity by using either static lift or by using the dynamic lift of an airfoil, or in a few cases the downward thrust from jet engines.Although...

 are certified to this level. The cost versus loss of lives has been considered appropriate at this level (by FAA for aircraft systems under Federal Aviation Regulations
Federal Aviation Regulations
The Federal Aviation Regulations, or FARs, are rules prescribed by the Federal Aviation Administration governing all aviation activities in the United States. The FARs are part of Title 14 of the Code of Federal Regulations...

) .

Preventing failure

Once a failure mode is identified, it can usually be mitigated by adding extra or redundant equipment to the system. For example, nuclear reactors contain dangerous radiation
Radiation
In physics, radiation is a process in which energetic particles or energetic waves travel through a medium or space. There are two distinct types of radiation; ionizing and non-ionizing...

, and nuclear reactions can cause so much heat
Heat
In physics and thermodynamics, heat is energy transferred from one body, region, or thermodynamic system to another due to thermal contact or thermal radiation when the systems are at different temperatures. It is often described as one of the fundamental processes of energy transfer between...

 that no substance might contain them. Therefore reactors have emergency core cooling systems to keep the temperature down, shielding to contain the radiation, and engineered barriers (usually several, nested, surmounted by a containment building
Containment building
A containment building, in its most common usage, is a steel or reinforced concrete structure enclosing a nuclear reactor. It is designed, in any emergency, to contain the escape of radiation to a maximum pressure in the range of 60 to 200 psi...

) to prevent accidental leakage. Safety-critical systems are commonly required to permit no single event or component failure to result in a catastrophic failure mode.

Most biological
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

 organisms have a certain amount of redundancy: multiple organs, multiple limbs, etc.

For any given failure, a fail-over or redundancy can almost always be designed and incorporated into a system.

Safety and reliability

Probabilistic risk assessment
Probabilistic risk assessment
Probabilistic risk assessment is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity ....

 has created a close relationship between safety and reliability. Component reliability, generally defined in terms of component failure rate
Failure rate
Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....

, and external event probability are both used in quantitative safety assessment methods such as FTA. Related probabilistic methods are used to determine system Mean Time Between Failure (MTBF), system availability, or probability of mission success or failure. Reliability analysis has a broader scope than safety analysis, in that non-critical failures are considered. On the other hand, higher failure rates are considered acceptable for non-critical systems.

Safety generally cannot be achieved through component reliability alone. Catastrophic failure probabilities of 10-9 per hour correspond to the failure rates of very simple components such as resistors
Resistor
A linear resistor is a linear, passive two-terminal electrical component that implements electrical resistance as a circuit element.The current through a resistor is in direct proportion to the voltage across the resistor's terminals. Thus, the ratio of the voltage applied across a resistor's...

 or capacitors
Capacitor
A capacitor is a passive two-terminal electrical component used to store energy in an electric field. The forms of practical capacitors vary widely, but all contain at least two electrical conductors separated by a dielectric ; for example, one common construction consists of metal foils separated...

. A complex system containing hundreds or thousands of components might be able to achieve a MTBF of 10,000 to 100,000 hours, meaning it would fail at 10-4 or 10-5 per hour. If a system failure is catastrophic, usually the only practical way to achieve 10-9 per hour failure rate is through redundancy. Two redundant systems with independent failure modes, each having an MTBF of 100,000 hours, could achieve a failure rate on the order of 10-10 per hour because of the multiplication rule for independent events.

When adding equipment is impractical (usually because of expense), then the least expensive form of design is often "inherently fail-safe". That is, change the system design so its failure modes are not catastrophic. Inherent fail-safes are common in medical equipment, traffic and railway signals, communications equipment, and safety equipment.

The typical approach is to arrange the system so that ordinary single failures cause the mechanism to shut down in a safe way (for nuclear power plants, this is termed a passively safe
Passive nuclear safety
Passive nuclear safety is a safety feature of a nuclear reactor that does not require operator actions or electronic feedback in order to shut down safely in the event of a particular type of emergency...

 design, although more than ordinary failures are covered). Alternately, if the system contains a hazard source such as a battery or rotor, then it may be possible to remove the hazard from the system so that its failure modes cannot be catastrophic. The U.S. Department of Defense Standard Practice for System Safety (MIL–STD–882) places the highest priority on elimination of hazards through design selection.

One of the most common fail-safe systems is the overflow tube in baths and kitchen sinks. If the valve sticks open, rather than causing an overflow and damage, the tank spills into an overflow. Another common example is that in an elevator
Elevator
An elevator is a type of vertical transport equipment that efficiently moves people or goods between floors of a building, vessel or other structures...

 the cable supporting the car keeps spring-loaded brake
Brake
A brake is a mechanical device which inhibits motion. Its opposite component is a clutch. The rest of this article is dedicated to various types of vehicular brakes....

s open. If the cable breaks, the brakes grab rails, and the elevator cabin does not fall.

Some systems can never be made fail safe, as continuous availability is needed. For example, loss of engine trust in flight is dangerous. Redundancy, fault tolerance, or recovery procedures are used for these situations (e.g. multiple independent controlled and fuel fed engines). This also makes the system less sensitive for the reliability prediction errors or quality induced uncertainty for the separate items. On the other hand, failure detection & correction and avoidance of common cause failures becomes here increasingly important to ensure system level reliability.

Containing failure

It is common practice to plan for the failure of safety systems through containment and isolation methods. The use of isolating valves, also known as the block and bleed manifold
Block and bleed manifold
A block and bleed manifold is a hydraulic manifold that combines one or more block/isolate valves, usually ball valves, and one or more bleed/vent valves, usually ball or needle valves, into one component, for interface with other components of a hydraulic system...

, is very common in isolating pumps, tanks, and control valves that may fail or need routine maintenance. In addition, nearly all tanks containing oil or other hazardous chemicals are required to have containment barriers set up around them to contain 100% of the volume of the tank in the event of a catastrophic tank failure. Similarly, in a long pipeline, there are remote-closing valves at regular intervals so that a leak can be isolated. Fault isolation boundaries are similarly designed into critical electronic systems or computer software. The goal of all containment systems is to provide means of mitigating the consequences of failure.
Fault isolation might also refer to the extend to which detected failures might be isolated for sucessfull recovery. The isolation level shows the system identure level at which the failure cause can be recovered (often by replacement of a line replaceable unit).

See also

  • ARP4761
    ARP4761
    ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is a standard from the Society of Automotive Engineers . In conjunction with SAE ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S...

  • Earthquake engineering
    Earthquake engineering
    Earthquake engineering is the scientific field concerned with protecting society, the natural and the man-made environment from earthquakes by limiting the seismic risk to socio-economically acceptable levels...

  • Effective Safety Training
    Effective Safety Training
    The Occupational Safety and Health Administration has written voluminous workplace safety standards and regulations that affect employers and employees in the United States. It is the employer's legal responsibility to educate employees on all workplace safety standards and the hazards that their...

  • Forensic engineering
    Forensic engineering
    Forensic engineering is the investigation of materials, products, structures or components that fail or do not operate or function as intended, causing personal injury or damage to property. The consequences of failure are dealt with by the law of product liability. The field also deals with...

  • Hazard and operability study
  • Industrial Engineering
    Industrial engineering
    Industrial engineering is a branch of engineering dealing with the optimization of complex processes or systems. It is concerned with the development, improvement, implementation and evaluation of integrated systems of people, money, knowledge, information, equipment, energy, materials, analysis...

  • IEC 61508
    IEC 61508
    IEC 61508 is an international standard of rules applied in industry. It is titled "Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems"....

  • Nuclear safety
    Nuclear safety
    Nuclear safety covers the actions taken to prevent nuclear and radiation accidents or to limit their consequences. This covers nuclear power plants as well as all other nuclear facilities, the transportation of nuclear materials, and the use and storage of nuclear materials for medical, power,...

  • Process Safety Management
    Process Safety Management
    Process Safety Management is a regulation, promulgated by the U.S. Occupational Safety and Health Administration . A process is any activity or combination of activities including any use, storage, manufacturing, handling or the on-site movement of Highly Hazardous Chemicals as defined by OSHA...

  • Risk assessment
    Risk assessment
    Risk assessment is a step in a risk management procedure. Risk assessment is the determination of quantitative or qualitative value of risk related to a concrete situation and a recognized threat...

  • Risk management
    Risk management
    Risk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...

  • Safety life cycle
    Safety life cycle
    The safety life cycle is the series of phases from initiation and specifications of safety requirements, covering design and development of safety features in a safety-critical system, and ending in decommissioning of that system....

  • Workplace safety
    Workplace safety
    Workplace safety & health is a category of management responsibility in places of employment.To ensure the safety and health of workers, managers establish a focus on safety that can include elements such as:* management leadership and commitment...

  • Zonal Safety Analysis
    Zonal safety analysis
    Zonal Safety Analysis is one of three analytical methods which, taken together, form a Common Cause Analysis in aircraft safety engineering under SAE ARP4761. The other two methods are Particular Risks Analysis and Common Mode Analysis . Aircraft system safety requires the independence of...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK