Hot spare
Encyclopedia
A hot spare or hot standby is used as a failover
Failover
In computing, failover is automatic switching to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network...

 mechanism to provide reliability in system configuration
System Configuration
A system configuration in systems engineering defines the computers, processes, and devices that compose the system and its boundary. More general the system configuration is the specific definition of the elements that define and/or prescribe what a system is composed of.Alternatively the term...

s. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched into operation. More generally, a hot standby can be used to refer to any device or system that is held in readiness to overcome an otherwise significant start-up delay.

Examples

Examples of hot spares are components such as A/V switches, computers, network printers, and hard disks. The equipment is powered on, or considered "hot," but not actively functioning in (i.e. used by) the system.

Electrical generator
Electrical generator
In electricity generation, an electric generator is a device that converts mechanical energy to electrical energy. A generator forces electric charge to flow through an external electrical circuit. It is analogous to a water pump, which causes water to flow...

s may be held on hot standby, or a steam train may be held at the shed fired up (literally hot) ready to replace a possible failure of an engine in service.

Explanation

In designing a reliable system, it is recognized that there will be failures. At the extreme, a complete system can be duplicated and kept up to date—so in the event of the primary system failing, the secondary system can be switched in with little or no interruption. More often, a hot spare is a single vital component without which the entire system would fail. The spare component is integrated into the system in such a way that in the event of a problem, the system can be altered to use the spare component. This may be done automatically or manually, but in either case it is normal to have some means of error detection. A hot spare does not necessarily give 100% availability or protect against temporary loss of the system during the switching process; it is designed to significantly reduce the time that the system is unavailable.

Hot standby may have a slightly different connotation of being active but not productive to hot spare, that is it is a state rather than object. For example, in a national power grid, the supply of power needs to be balanced to demand over a short term. It can take many hours to bring a coal-fired power station up to productive temperatures. To allow for load balancing, generator turbines may be kept running with the generators switched off so as peaks of demand occur, the generators can rapidly be switched on to balance the load. Being in the state of being ready to run is known as hot standby. Though it is not a modern phenomenon, steam train operators might hold a spare steam engine at a terminus fired up, as starting an engine cold would take a significant amount of time.

The spare may be similar component or system, or it may be a system of reduced performance, designed to cope for the duration of the time to repair and recover the original component. In high availability systems, it is common to design so that not only is there a spare that can quickly be switched in, but also that the failed component can be repaired or replaced without stopping the system - this is known as hot swapping
Hot swapping
Hot swapping and hot plugging are terms used to describe the functions of replacing computer system components without shutting down the system...

. It may be considered that the probability of a second failure is low, and therefore the system is designed simply to allow operation to continue until a suitable maintenance period. The appropriate solution is normally determined by balancing the costs of implementing the availability against the likelihood of a problem and the severity of that problem.

Computer usage

A hot spare disk is a disk or group of disks used to automatically or manually, depending upon the hot spare policy, replace a failing or failed disk in a RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...

 configuration. The hot spare disk reduces the mean time to recovery
Mean time to recovery
Mean time to recovery is the average time that a device will take to recover from any failure. Examples of such devices range from self-resetting fuses , up to whole systems which have to be repaired or replaced.The MTTR would usually be part of a maintenance contract, where the user would pay...

 (MTTR) for the RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...

redundancy group, thus reducing the probability of a second disk failure and the resultant data loss that would occur in any singly redundant RAID (e.g., RAID-1, RAID-5, RAID-10). Typically, a hot spare is available to replace a number of different disks and systems employing a hot spare normally require a redundant group to allow time for the data to be generated onto the spare disk. During this time the system is exposed to data loss due to a subsequent failure, and therefore the automatic switching to a spare disk reduces the time of exposure to that risk compared to manual discovery and implementation.

The concept of hot spares is not limited to hardware, but also software systems can be held in a state of readiness, for example a database server may have a software copy on hot standby, possibly even on the same machine to cope with the various factors that make a database unreliable, such as the impact of disc failure, poorly written queries or database software errors.

Hot standby operation in railway signalling

At least two units of the same type will be powered up, receiving the same set of inputs, performing identical computations and producing identical outputs in a nearly-synchronous manner. The outputs are typically physical outputs (individual ON/OFF type digital signals, or analog signals), or serial data messages wrapped in suitable protocols depending upon the nature of their intended use. Outputs from only one unit (designated as the master or on-line unit, via application logic) are used to control external devices (such as switches, signals, on-board propulsion/braking control devices, etc.) or simply to provide displays. The other unit is a hot-standby or a hot spare unit, ready to take over if the master unit fails. When the master unit fails, an automatic failover to the hot spare occurs within a very short time and the outputs from the hot spare, now the master unit, are delivered to the controlled devices and displays. The controlled devices and displays may experience a short blip or disturbance during the failover time. However, they can be designed to tolerate/ignore the disturbances so that the overall system operation is not affected.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK