Pipeline (software)
Encyclopedia
In software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

, a pipeline consists of a chain of processing elements (processes
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

, threads
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

, coroutine
Coroutine
Coroutines are computer program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations...

s, etc..), arranged so that the output of each element is the input of the next. Usually some amount of buffering
Buffer (computer science)
In computer science, a buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device...

 is provided between consecutive elements. The information that flows in these pipelines is often a stream of records
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

, bytes
Byte stream
In computer science, a byte stream is a bit stream, in which data bits are grouped into units, called bytes.In computer networking the term octet stream is sometimes used to refer to the same thing; it emphasizes the use of bytes having the length of 8 bits, known as octets.Formally, a byte stream...

 or bits
Bitstream
A bitstream or bit stream is a time series of bits.A bytestream is a series of bytes, typically of 8 bits each, and can be regarded as a special case of a bitstream....

.

The concept is also called the pipes and filters design pattern. It was named by analogy to a physical pipeline
Pipeline transport
Pipeline transport is the transportation of goods through a pipe. Most commonly, liquids and gases are sent, but pneumatic tubes that transport solid capsules using compressed air are also used....

.

Multiprocessed pipelines

Pipelines are often implemented in a multitasking
Computer multitasking
In computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...

 OS
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

, by launching all elements at the same time as processes, and automatically servicing the data read requests by each process with the data written by the upstream process. In this way, the CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 will be naturally switched among the processes by the scheduler
Scheduling (computing)
In computer science, a scheduling is the method by which threads, processes or data flows are given access to system resources . This is usually done to load balance a system effectively or achieve a target quality of service...

 so as to minimize its idle time. In other common models, elements are implemented as lightweight threads or as coroutines to reduce the OS overhead often involved with processes. Depending upon the OS, threads may be scheduled directly by the OS or by a thread manager. Coroutines are always scheduled by a coroutine manager of some form.

Usually, read and write requests are blocking operations, which means that the execution of the source process, upon writing, is suspended until all data could be written to the destination process, and, likewise, the execution of the destination process, upon reading, is suspended until at least some of the requested data could be obtained from the source process. Obviously, this cannot lead to a deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

, where both processes would wait indefinitely for each other to respond, since at least one of the two processes will soon thereafter have its request serviced by the operating system, and continue to run.

For performance, most operating systems implementing pipes use pipe buffers
Buffer (computer science)
In computer science, a buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device...

, which allow the source process to provide more data than the destination process is currently able or willing to receive. Under most Unices and Unix-like operating systems, a special command is also available which implements a pipe buffer of potentially much larger and configurable size, typically called "buffer". This command can be useful if the destination process is significantly slower than the source process, but it is anyway desired that the source process can complete its task as soon as possible. E.g., if the source process consists of a command which reads an audio track from a CD and the destination process consists of a command which compresses the waveform
Waveform
Waveform means the shape and form of a signal such as a wave moving in a physical medium or an abstract representation.In many cases the medium in which the wave is being propagated does not permit a direct visual image of the form. In these cases, the term 'waveform' refers to the shape of a graph...

 audio data to a format like MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...

. In this case, buffering the entire track in a pipe buffer would allow the CD drive to spin down more quickly, and enable the user to remove the CD from the drive before the encoding process has finished.

Such a buffer command can be implemented using available operating system primitive
Primitive
Primitive may refer to:* Anarcho-primitivism, an anarchist critique of the origins and progress of civilization* Primitive culture, one that lacks major signs of economic development or modernity...

s for reading and writing data. Wasteful busy waiting
Busy waiting
In software engineering, busy-waiting or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input is available, or if a lock is available. Spinning can also be used to generate an arbitrary time delay, a technique that was necessary...

 can be avoided by using facilities such as poll or select
Select (Unix)
select is a system call and application programming interface in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels...

 or multithreading
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

.

VM/CMS and MVS

CMS Pipelines
Hartmann pipeline
A Hartmann pipeline is an extension of the Unix pipeline concept, providing for more complex paths, multiple input/output streams, and other features. It is an example and extension of Pipeline programming....

 is a port of the pipeline idea to VM/CMS and MVS
MVS
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers...

 systems. It supports much more complex pipeline structures than Unix shells, with steps taking multiple input streams and producing multiple output streams. (Such functionality is supported by the Unix kernel, but few programs use it as it makes for complicated syntax and blocking modes, although some shells do support it via arbitrary file descriptor
File descriptor
In computer programming, a file descriptor is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems...

 assignment). Due to the different nature of IBM mainframe operating systems, it implements many steps inside CMS Pipelines which in Unix are separate external programs, but can also call separate external programs for their functionality. Also, due to the record-oriented nature of files on IBM mainframes, pipelines operate in a record-oriented, rather than stream-oriented manner.

Pseudo-pipelines

On single-tasking operating systems, the processes of a pipeline have to be executed one by one in sequential order; thus the output of each process must be saved to a temporary file
Temporary file
Temporary files may be created by computer programs for a variety of purposes; principally when a program cannot allocate enough memory for its tasks, when the program is working on data bigger than the architecture's address space, or as a primitive form of inter-process communication.- Auxiliary...

, which is then read by the next process. Since there is no parallelism or CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 switching, this version is called a "pseudo-pipeline".

For example, the command line interpreter of MS-DOS
MS-DOS
MS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...

 ('COMMAND.COM') provides pseudo-pipelines with a syntax superficially similar to that of Unix pipelines
Pipeline (Unix)
In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...

. The command "dir | sort | more" would have been executed like this (albeit with more complicated temporary file names):
  1. Create temporary file 1.tmp
  2. Run command "dir", redirecting its output to 1.tmp
  3. Create temporary file 2.tmp
  4. Run command "sort", redirecting its input to 1.tmp and its output to 2.tmp
  5. Run command "more", redirecting its input to 2.tmp, and presenting its output to the user
  6. Delete 1.tmp and 2.tmp, which are no longer needed
  7. Return to the command prompt
    Command Prompt
    Command Prompt is the Microsoft-supplied command-line interpreter on OS/2, Windows CE and on Windows NT-based operating systems...



All temporary files are stored in the directory pointed to by %TEMP%, or the current directory if %TEMP% isn't set.

Thus, pseudo-pipes acted like true pipes with a pipe buffer of unlimited size (disk space limitations notwithstanding), with the significant restriction that a receiving process could not read any data from the pipe buffer until the sending process finished completely. Besides causing disk traffic, if one doesn't install a harddisk cache such as SMARTDRV, that would have been unnecessary under multi-tasking operating systems, this implementation also made pipes unsuitable for applications requiring real-time response, like, for example, interactive purposes (where the user enters commands that the first process in the pipeline receives via stdin, and the last process in the pipeline presents its output to the user via stdout).

Also, commands that produce a potentially infinite amount of output, such as the yes
Yes (Unix)
yes is a Unix command, which outputs an affirmative response, or a user-defined string of text continuously until killed.-Description:By itself, the yes command outputs 'y' or whatever is specified as an argument, followed by a newline repeatedly until stopped by the user or otherwise killed; when...

 command, cannot be used in a pseudo-pipeline, since they would run until the temporary disk space is exhausted, so the following processes in the pipeline could not even start to run.

Object pipelines

Beside byte stream-based pipelines, there are also object pipelines. In an object pipeline, the processes output objects instead of texts; therefore removing the string parsing tasks that are common in UNIX shell scripts. Windows PowerShell
Windows PowerShell
Windows PowerShell is Microsoft's task automation framework, consisting of a command-line shell and associated scripting language built on top of, and integrated with the .NET Framework...

 uses this scheme and transfers .NET objects. Channel
Channel (programming)
A Channel is a construct used in interprocess communication to represent some binding between concurrent processes. An object may be sent over a channel, and a process is able to receive any objects sent over a channel it has a reference to...

s, found in the Limbo programming language
Limbo programming language
Limbo is a programming language for writing distributed systems and is the language used to write applications for the Inferno operating system...

, and the IPython
IPython
IPython is an interactive shell for the Python programming language that offers enhanced introspection, additional shell syntax, tab completion and rich history.- Other features :...

 ipipe extension are other examples of this metaphor.

Pipelines in GUIs

Graphical environments such as RISC OS
RISC OS
RISC OS is a computer operating system originally developed by Acorn Computers Ltd in Cambridge, England for their range of desktop computers, based on their own ARM architecture. First released in 1987, under the name Arthur, the subsequent iteration was renamed as in 1988...

 and ROX Desktop
ROX Desktop
The ROX Desktop is a graphical desktop environment for the X Window System. It is based on the ROX-Filer which is a drag and drop spatial file manager. It is free software released under the GNU General Public License. The environment was inspired by the user interface of RISC OS...

 also make use of pipelines. Rather than providing a save dialog box
Dialog box
In a graphical user interface of computers, a dialog box is a type of window used to enable reciprocal communication or "dialog" between a computer and its user. It may communicate information to the user, prompt the user for a response, or both...

 containing a file manager
File manager
A file manager or file browser is a computer program that provides a user interface to work with file systems. The most common operations performed on files or groups of files are: create, open, edit, view, print, play, rename, move, copy, delete, search/find, and modify file attributes, properties...

 to let the user specify where a program should write data, RISC OS and ROX provide a save dialog box containing an icon
Icon (computing)
A computer icon is a pictogram displayed on a computer screen and used to navigate a computer system or mobile device. The icon itself is a small picture or symbol serving as a quick, intuitive representation of a software tool, function or a data file accessible on the system. It functions as an...

 (and a field to specify the name). The destination is specified by dragging and dropping
Drag-and-drop
In computer graphical user interfaces, drag-and-drop is the action of selecting a virtual object by "grabbing" it and dragging it to a different location or onto another virtual object...

 the icon. The user can drop the icon anywhere an already-saved file could be dropped, including onto icons of other programs. If the icon is dropped onto a program's icon, it's loaded and the contents that would otherwise have been saved are passed in on the new program's standard input stream.

For instance, a user browsing the world-wide web might come across a .gz compressed image which they want to edit and re-upload. Using GUI pipelines, they could drag the link to their de-archiving program, drag the icon representing the extracted contents to their image editor, edit it, open the save as dialog, and drag its icon to their uploading software.

Conceptually, this method could be used with a conventional save dialog box, but this would require the user's programs to have an obvious and easily-accessible location in the filesystem that can be navigated to. In practice, this is often not the case, so GUI pipelines are rare.

Other considerations

The name 'pipeline' comes from a rough analogy with physical plumbing in that a pipeline usually allows information to flow in only one direction, like water often flows in a pipe.

Pipes and filters
Filter (software)
A filter is a computer program to process a data stream. Some operating systems such as Unix are rich with filter programs. Even Windows has some simple filters built into its command shell, most of which have significant enhancements relative to the similar filter commands that were available in...

 can be viewed as a form of functional programming
Functional programming
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...

, using byte streams as data objects; more specifically, they can be seen as a particular form of monad
Monads in functional programming
In functional programming, a monad is a programming structure that represents computations. Monads are a kind of abstract data type constructor that encapsulate program logic instead of data in the domain model...

 for I/O
I/O
I/O may refer to:* Input/output, a system of communication for information processing systems* Input-output model, an economic model of flow prediction between sectors...

.

The concept of pipeline is also central to the Cocoon
Apache Cocoon
Apache Cocoon, usually just called Cocoon, is a web application framework built around the concepts of pipeline, separation of concerns and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language...

 web development framework
Software framework
In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by user code, thus providing application specific software...

 or to any XProc
XProc
XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and Validate...

 (the W3C Standards) implementations, where it allows a source stream to be modified before eventual display.

This pattern encourages the use of text streams as the input and output of programs. This reliance on text has to be accounted when creating graphic
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

 shells to text programs.

History

Process pipelines were invented by Douglas McIlroy
Douglas McIlroy
Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

, one of the designers of the first Unix shell
Unix shell
A Unix shell is a command-line interpreter or shell that provides a traditional user interface for the Unix operating system and for Unix-like systems...

s, and greatly contributed to the popularity of that operating system. It can be considered the first non-trivial instance of software componentry.

The idea was eventually ported to other operating systems, such as DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

, OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

, Windows NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...

, BeOS
BeOS
BeOS is an operating system for personal computers which began development by Be Inc. in 1991. It was first written to run on BeBox hardware. BeOS was optimized for digital media work and was written to take advantage of modern hardware facilities such as symmetric multiprocessing by utilizing...

, AmigaOS
AmigaOS
AmigaOS is the default native operating system of the Amiga personal computer. It was developed first by Commodore International, and initially introduced in 1985 with the Amiga 1000...

, MorphOS
MorphOS
MorphOS is an Amiga-compatible computer operating system. It is a mixed proprietary and open source OS produced for the Pegasos PowerPC processor based computer, PowerUP accelerator equipped Amiga computers, and a series of Freescale development boards that use the Genesi firmware, including the...

 and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 (the latter being a UNIX OS).

See also

  • Anonymous pipe
    Anonymous pipe
    In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication . An implementation is often integrated into the operating system's file IO subsystem...

  • Component-based software engineering
    Component-based software engineering
    Component-based software engineering is a branch of software engineering that emphasizes the separation of concerns in respect of the wide-ranging functionality available throughout a given software system...

  • GStreamer
    GStreamer
    GStreamer is a pipeline-based multimedia framework written in the C programming language with the type system based on GObject.GStreamer allows a programmer to create a variety of media-handling components, including simple audio playback, audio and video playback, recording, streaming and editing...

     for a multimedia framework built on plugin pipelines
  • Named pipe
    Named pipe
    In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. The concept is also found in Microsoft Windows, although the semantics differ substantially...

    , an operating system construct intermediate to anonymous pipe and file.
  • Pipeline (computing) for other computer-related versions of the concept.
  • Pipeline (Unix)
    Pipeline (Unix)
    In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...

     for details specific to Unix
    Unix
    Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

    .
  • Plumber
    Plumber (program)
    The plumber, in the Plan 9 from Bell Labs and Inferno operating systems, is a mechanism for interprocess communication, somewhat similar to copy and paste.The plumber is a program which handles all the messaging when programs make plumbing messages...

     - "intelligent pipes" developed as part of Plan 9
    Plan 9 from Bell Labs
    Plan 9 from Bell Labs is a distributed operating system. It was developed primarily for research purposes as the successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002...

  • Programming in the large
    Programming in the large
    In software development, programming in the large and programming in the small describe two different approaches to writing software. The terms were coined by Frank DeRemer and Hans Kron in their 1975 paper "Programming-in-the large versus programming-in-the-small" Fred Brooks identifies that the...

  • Software design pattern
  • producer-consumer problem
    Producer-consumer problem
    In problem is a classical example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue. The producer's job is to generate a piece of data, put it into the buffer and start again...

     for implementation aspects of software pipelines.
  • XML pipeline
    XML pipeline
    In software, an XML Pipeline is formed when XML processes, especially XML transformations and XML validations, are connected together....

     for processing of XML
    XML
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

    files

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK