Therac-25
Encyclopedia
The Therac-25 was a radiation therapy machine
Radiation therapy
Radiation therapy , radiation oncology, or radiotherapy , sometimes abbreviated to XRT or DXT, is the medical use of ionizing radiation, generally as part of cancer treatment to control malignant cells.Radiation therapy is commonly applied to the cancerous tumor because of its ability to control...

 produced by Atomic Energy of Canada Limited
Atomic Energy of Canada Limited
Atomic Energy of Canada Limited or AECL is a Canadian federal Crown corporation and Canada's largest nuclear science and technology laboratory...

 (AECL) after the Therac-6 and Therac-20 units (the earlier units had been produced in partnership with CGR of France
France
The French Republic , The French Republic , The French Republic , (commonly known as France , is a unitary semi-presidential republic in Western Europe with several overseas territories and islands located on other continents and in the Indian, Pacific, and Atlantic oceans. Metropolitan France...

).
Therac 25 user interface


PATIENT NAME : JOHN DOE
TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25

ACTUAL PRESCRIBED
UNIT RATE/MINUTE 0 200
MONITOR UNITS
Monitor unit
A monitor unit is a measure of machine output of a linear accelerator in radiation therapy.Linear accelerators are calibrated to give a particular absorbed dose under particular conditions...

  50 50 200
TIME (MIN) 0.27 1.00

GANTRY ROTATION (DEG) 0.0 0 VERIFIED
COLLIMATOR
Collimator
A collimator is a device that narrows a beam of particles or waves. To "narrow" can mean either to cause the directions of motion to become more aligned in a specific direction or to cause the spatial cross section of the beam to become smaller.- Optical collimators :In optics, a collimator may...

 ROTATION (DEG) 359.2 359 VERIFIED
COLLIMATOR X (CM) 14.2 14.3 VERIFIED
COLLIMATOR Y (CM) 27.2 27.3 VERIFIED
WEDGE NUMBER 1 1 VERIFIED
ACCESSORY NUMBER 0 0 VERIFIED

DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO
TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777
OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:

It was involved in at least six accidents between 1985 and 1987, in which patients were given massive overdoses of radiation
Radiation poisoning
Acute radiation syndrome also known as radiation poisoning, radiation sickness or radiation toxicity, is a constellation of health effects which occur within several months of exposure to high amounts of ionizing radiation...

, approximately 100 times the intended dose. These accidents highlighted the dangers of software control
Control system
A control system is a device, or set of devices to manage, command, direct or regulate the behavior of other devices or system.There are two common classes of control systems, with many variations and combinations: logic or sequential controls, and feedback or linear controls...

 of safety-critical systems, and they have become a standard case study in health informatics
Health informatics
.Health informatics is a discipline at the intersection of information science, computer science, and health care...

 and software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

.

Problem description

The machine offered two modes of radiation therapy
Radiation therapy
Radiation therapy , radiation oncology, or radiotherapy , sometimes abbreviated to XRT or DXT, is the medical use of ionizing radiation, generally as part of cancer treatment to control malignant cells.Radiation therapy is commonly applied to the cancerous tumor because of its ability to control...

:
  • Direct electron-beam therapy, which delivered low doses of high-energy (5 MeV
    Electronvolt
    In physics, the electron volt is a unit of energy equal to approximately joule . By definition, it is equal to the amount of kinetic energy gained by a single unbound electron when it accelerates through an electric potential difference of one volt...

     to 25 MeV) electrons over short periods of time;
  • Megavolt X-ray
    Megavoltage X-rays
    Megavoltage X-rays are produced by linear accelerators operating at voltages in excess of 1000 kV range, and therefore have an energy in the MeV range...

     therapy, which delivered X-ray
    X-ray
    X-radiation is a form of electromagnetic radiation. X-rays have a wavelength in the range of 0.01 to 10 nanometers, corresponding to frequencies in the range 30 petahertz to 30 exahertz and energies in the range 120 eV to 120 keV. They are shorter in wavelength than UV rays and longer than gamma...

    s produced by colliding high-energy (25 MeV) electrons into a "target".


When operating in direct electron-beam therapy mode, a low-powered electron beam was emitted directly from the machine, then spread to safe concentration using scanning magnets. When operating in megavolt X-ray mode, the machine was designed to rotate four components into the path of the electron beam: a target, which converted the electron beam into X-rays; a flattening filter, which spread the beam out over a larger area; a set of movable blocks (also called a collimator
Collimator
A collimator is a device that narrows a beam of particles or waves. To "narrow" can mean either to cause the directions of motion to become more aligned in a specific direction or to cause the spatial cross section of the beam to become smaller.- Optical collimators :In optics, a collimator may...

), which shaped the X-ray beam; and an X-ray ion chamber, which measured the strength of the beam.

The accidents occurred when the high-power electron beam was activated instead of the intended low power beam, and without the beam spreader plate rotated into place. The machine's software did not detect that this had occurred, and therefore did not prevent the patient from receiving a potentially lethal dose of beta radiation. The high-powered electron beam struck the patients with approximately 100 times the intended dose of radiation, causing a feeling described by patient Ray Cox as "an intense electric shock". It caused him to scream and run out of the treatment room. Several days later, radiation burn
Radiation burn
A radiation burn is damage to the skin or other biological tissue caused by exposure to radio frequency energy or ionizing radiation.The most common type of radiation burn is a sunburn caused by UV radiation. High exposure to X-rays during diagnostic medical imaging or radiotherapy can also result...

s appeared and the patients showed the symptoms of radiation poisoning. In three cases, the injured patients later died from radiation poisoning
Radiation poisoning
Acute radiation syndrome also known as radiation poisoning, radiation sickness or radiation toxicity, is a constellation of health effects which occur within several months of exposure to high amounts of ionizing radiation...

.

The software flaw is recognized as a race condition
Race condition
A race condition or race hazard is a flaw in an electronic system or process whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events...

.

Root causes

A commission has concluded that the primary reason should be attributed to the bad software design and development practices, and not explicitly to several coding errors that were found. In particular, the software was designed so that it was realistically impossible to test it in a clean automated way.

Researchers who investigated the accidents found several contributing causes. These included the following institutional causes:
  • AECL did not have the software code independently reviewed
    Code review
    Code review is systematic examination of computer source code. It is intended to find and fix mistakes overlooked in the initial development phase, improving both the overall quality of software and the developers' skills...

    .
  • AECL did not consider the design of the software during its assessment of how the machine might produce the desired results and what failure modes existed. These form parts of the general techniques known as reliability modeling and risk management
    Risk management
    Risk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...

    .
  • The system noticed that something was wrong and halted the X-ray beam, but merely displayed the word "MALFUNCTION" followed by a number from 1 to 64. The user manual did not explain or even address the error codes, so the operator pressed the P key to override the warning and proceed anyway.
  • AECL personnel, as well as machine operators, initially did not believe complaints. This was likely due to overconfidence.
  • AECL had never tested the Therac-25 with the combination of software and hardware until it was assembled at the hospital.


The researchers also found several engineering
Engineering
Engineering is the discipline, art, skill and profession of acquiring and applying scientific, mathematical, economic, social, and practical knowledge, in order to design and build structures, machines, devices, systems, materials and processes that safely realize improvements to the lives of...

issues:
  • The failure only occurred when a particular nonstandard sequence of keystrokes was entered on the VT-100 terminal which controlled the PDP-11
    PDP-11
    The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...

     computer: an "X" to (erroneously) select 25MV photon mode followed by "cursor up", "E" to (correctly) select 25 MeV Electron mode, then "Enter", all within eight seconds. This sequence of keystrokes was improbable, and so the problem did not occur very often and went unnoticed for a long time.
  • The design did not have any hardware interlocks
    Interlock (engineering)
    Interlocking is a method of preventing undesired states in a state machine, which in a general sense can include any electrical, electronic, or mechanical device or system....

     to prevent the electron-beam from operating in its high-energy mode without the target in place.
  • The engineer had reused
    Code reuse
    Code reuse, also called software reuse, is the use of existing software, or software knowledge, to build new software.-Overview:Ad hoc code reuse has been practiced from the earliest days of programming. Programmers have always reused sections of code, templates, functions, and procedures...

     software from older models. These models had hardware interlocks that masked their software defects. Those hardware safeties had no way of reporting that they had been triggered, so there was no indication of the existence of faulty software commands.
  • The hardware provided no way for the software to verify that sensors were working correctly (see open-loop controller
    Open-loop controller
    An open-loop controller, also called a non-feedback controller, is a type of controller that computes its input into a system using only the current state and its model of the system....

    ). The table-position system was the first implicated in Therac-25's failures; the manufacturer revised it with redundant switches to cross-check their operation.
  • The equipment control task
    Process (computing)
    In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

     did not properly synchronize
    Mutual exclusion
    Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...

     with the operator interface task, so that race condition
    Race condition
    A race condition or race hazard is a flaw in an electronic system or process whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events...

    s occurred if the operator changed the setup too quickly. This was missed during testing, since it took some practice before operators were able to work quickly enough to trigger this failure mode
    Failure mode
    Failure causes are defects in design, process, quality, or part application, which are the underlying cause of a failure or which initiate a process which leads to failure. Where failure depends on the user of the product or process, then human error must be considered.-Component failure:A part...

    .
  • The software set a flag variable
    Flag (computing)
    In computer programming, flag can refer to one or more bits that are used to store a binary value or code that has an assigned meaning, but can refer to uses of other data types...

     by incrementing it. Occasionally an arithmetic overflow
    Arithmetic overflow
    The term arithmetic overflow or simply overflow has the following meanings.# In a computer, the condition that occurs when a calculation produces a result that is greater in magnitude than that which a given register or storage location can store or represent.# In a computer, the amount by which a...

     occurred, causing the software to bypass safety checks.


The software was written in assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 that might require more attention for testing and good design. However the choice of language by itself is not listed as a primary cause in the report. The machine also used its own operating system.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK