Intermittent Fault
Encyclopedia
An intermittent fault, often called simply an "intermittent", is a malfunction of a device or system that occurs at intervals, usually irregular, in a device or system that functions normally at other times. Intermittent faults are common to all branches of technology
, including computer
software. An intermittent fault is caused by several contributing factors, some of which may be effectively random, which occur simultaneously. The more complex the system or mechanism involved, the greater the likelihood of an intermittent fault.
A simple example of an effectively random cause in a physical system is a borderline electrical connection in the wiring or a component of a circuit, where (cause 1, the cause that must be identified and rectified) two conductors are very close, and actually do or do not establish a connection allowing enough current to flow for correct operation subject to (cause 2, which need not be identified) a minor change in temperature, vibration, orientation, voltage, etc. (Sometimes this particular case is described as an "intermittent connection" rather than "fault".) In computer software a program may (cause 1) fail to initialise
a variable which is required to be initially zero; if the program is run in circumstances such that memory is almost always clear before it starts, it will malfunction on the rare occasions that (cause 2) the memory where the variable is stored happens to be non-zero beforehand.
Intermittent faults are notoriously difficult to identify and repair ("troubleshoot") because each individual factor does not create the problem alone, so the factors can only be identified while the malfunction is actually occurring. In addition, the person capable of identifying and solving the problem is seldom the usual operator. Because the timing of the malfunction is unpredictable, and both device or system downtime
and engineers' time incur cost
, the fault is often simply tolerated if not too frequent unless it causes unacceptable problems or dangers. For example, some intermittent faults in medical life support
equipment can kill a patient.
If an intermittent fault occurs for long enough during troubleshooting, it can be identified and resolved in the usual way.
Some techniques to resolve intermittent faults are:
Technology
Technology is the making, usage, and knowledge of tools, machines, techniques, crafts, systems or methods of organization in order to solve a problem or perform a specific function. It can also refer to the collection of such tools, machinery, and procedures. The word technology comes ;...
, including computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
software. An intermittent fault is caused by several contributing factors, some of which may be effectively random, which occur simultaneously. The more complex the system or mechanism involved, the greater the likelihood of an intermittent fault.
A simple example of an effectively random cause in a physical system is a borderline electrical connection in the wiring or a component of a circuit, where (cause 1, the cause that must be identified and rectified) two conductors are very close, and actually do or do not establish a connection allowing enough current to flow for correct operation subject to (cause 2, which need not be identified) a minor change in temperature, vibration, orientation, voltage, etc. (Sometimes this particular case is described as an "intermittent connection" rather than "fault".) In computer software a program may (cause 1) fail to initialise
Initialization (programming)
In computer programming, initialization is the assignment of an initial value for a data object or variable. The manner in which initialization is performed depends on programming language, as well as type, storage class, etc., of an object to be initialized. Programming constructs which perform...
a variable which is required to be initially zero; if the program is run in circumstances such that memory is almost always clear before it starts, it will malfunction on the rare occasions that (cause 2) the memory where the variable is stored happens to be non-zero beforehand.
Intermittent faults are notoriously difficult to identify and repair ("troubleshoot") because each individual factor does not create the problem alone, so the factors can only be identified while the malfunction is actually occurring. In addition, the person capable of identifying and solving the problem is seldom the usual operator. Because the timing of the malfunction is unpredictable, and both device or system downtime
Downtime
The term downtime is used to refer to periods when a system is unavailable.Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function...
and engineers' time incur cost
Cost
In production, research, retail, and accounting, a cost is the value of money that has been used up to produce something, and hence is not available for use anymore. In business, the cost may be one of acquisition, in which case the amount of money expended to acquire it is counted as cost. In this...
, the fault is often simply tolerated if not too frequent unless it causes unacceptable problems or dangers. For example, some intermittent faults in medical life support
Life support
Life support, in medicine is a broad term that applies to any therapy used to sustain a patient's life while they are critically ill or injured. There are many therapies and techniques that may be used by clinicians to achieve the goal of sustaining life...
equipment can kill a patient.
If an intermittent fault occurs for long enough during troubleshooting, it can be identified and resolved in the usual way.
Some techniques to resolve intermittent faults are:
- Automatic logging of relevant parameters over a long enough time for the fault to manifest can help; parameter values at the time of the fault may identify the cause so that appropriate remedial action can be taken.
- Changing operating circumstances while the fault is present to see if the fault temporarily clears or changes. For example, tapping components, cooling them with freezer spray, heating them. A time-honoured, and sometimes effective, way for a user to clear an intermittent fault in domestic electronics is to hit the cabinet, though this is less likely to work with modern semiconductor electronics and printed circuits.
- a database of similar faults which have been resolved in identical or similar equipment
- precautionary changes, without attempting to pinpoint the fault. For example, in much electronic equipment aluminium electrolytic capacitorElectrolytic capacitorAn electrolytic capacitor is a type of capacitor that uses an electrolyte, an ionic conducting liquid, as one of its plates, to achieve a larger capacitance per unit volume than other types. They are often referred to in electronics usage simply as "electrolytics"...
s subject to high ripple currents can be changed as a routine measure, without bothering to troubleshoot the fault at all. Connectors can be disconnected and reseated. This is sometimes a measure of desperation; things are changed until the fault stops happening, and it is hoped that it is actually resolved rather than dormant. - In electrical systems and cable systems, time domain reflectometry techniques are used: pulses are sent down electric wiring and the pulses reflected back are examined for anomalies, for example intermittent leakage during the stresses of aircraft operation.
External links
- http://www.chiark.greenend.org.uk/~sgtatham/bugs.htmlA discussion of software debuggingDebuggingDebugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge...
] - Sci.electronics.repair FAQ, see section "Troubleshooting of Intermittent Problems"