
Dataflow
    
    Encyclopedia
    
        Dataflow is a term used in computing
, and may have various shades of meaning. It is closely related to message passing
.
based on the idea that changing the value of a variable should automatically force recalculation of the values of variables which depend on its value.
Dataflow programming embodies these principles, with spreadsheet
s perhaps the most widespread embodiment of dataflow. For example, in a spreadsheet you can specify a cell formula which depends on other cells; then when any of those cells is updated the first cell's value is automatically recalculated. It's possible for one change to initiate a whole sequence of changes, if one cell depends on another cell which depends on yet another cell, and so on.
The dataflow technique is not restricted to recalculating numeric values, as done in spreadsheets. For example, dataflow can be used to redraw a picture in response to mouse movements, or to make a robot turn in response to a change in light level.
One benefit of dataflow is that it can reduce the amount of coupling-related code in a program. For example, without dataflow, if a variable Y depends on a variable X, then whenever X is changed Y must be explicitly recalculated. This means that Y is coupled to X. This means that the update operation must be explicitly contained in the program and eventually checking must be added to avoid cyclical dependencies. Dataflow improves this situation by making the recalculation of Y automatic, thereby eliminating the coupling from X to Y. Dataflow makes implicit a significant amount of computation that must be expressed explicitly in other programming paradigms.
Dataflow is also sometimes referred to as reactive programming
.
There have been a few programming languages created specifically to support dataflow. In particular, many (if not most) visual programming language
s have been based on the idea of dataflow.
Distributed data flow
s have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional style of specifications, and simplifies formal reasoning about system components.
research in the 1970s and early 1980s. Jack Dennis
of MIT pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use Content-addressable memory
are called dynamic dataflow machines by Arvind
. They use tags in memory to facilitate parallelism.
Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).
.)
In Kahn process networks
, named after Dr. Gilles Kahn
, the processes are determinate. This implies that each determinate process computes a continuous function
from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using fixpoint theory
. The movement and transformation of the data is represented by a series of shapes and lines.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, and may have various shades of meaning. It is closely related to message passing
Message passing
Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages  to other processes...
.
Software architecture
Dataflow is a software architectureSoftware architecture
The software architecture of a system is the set      of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both...
based on the idea that changing the value of a variable should automatically force recalculation of the values of variables which depend on its value.
Dataflow programming embodies these principles, with spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...
s perhaps the most widespread embodiment of dataflow. For example, in a spreadsheet you can specify a cell formula which depends on other cells; then when any of those cells is updated the first cell's value is automatically recalculated. It's possible for one change to initiate a whole sequence of changes, if one cell depends on another cell which depends on yet another cell, and so on.
The dataflow technique is not restricted to recalculating numeric values, as done in spreadsheets. For example, dataflow can be used to redraw a picture in response to mouse movements, or to make a robot turn in response to a change in light level.
One benefit of dataflow is that it can reduce the amount of coupling-related code in a program. For example, without dataflow, if a variable Y depends on a variable X, then whenever X is changed Y must be explicitly recalculated. This means that Y is coupled to X. This means that the update operation must be explicitly contained in the program and eventually checking must be added to avoid cyclical dependencies. Dataflow improves this situation by making the recalculation of Y automatic, thereby eliminating the coupling from X to Y. Dataflow makes implicit a significant amount of computation that must be expressed explicitly in other programming paradigms.
Dataflow is also sometimes referred to as reactive programming
Reactive programming
In computing, reactive programming is a programming paradigm oriented around data flows and the propagation of change. This means that it should be possible to express static or dynamic data flows with ease in the programming languages used, and that the underlying execution model will...
.
There have been a few programming languages created specifically to support dataflow. In particular, many (if not most) visual programming language
Visual programming language
In computing, a visual programming language  is any programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. A VPL allows programming with visual expressions, spatial arrangements of text and graphic symbols, used...
s have been based on the idea of dataflow.
Distributed data flow
Distributed data flow
Distributed data flow  refers to a set of events in a distributed application or protocol that satisfies the following informal properties:* Asynchronous, non-blocking, and one-way...
s have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional style of specifications, and simplifies formal reasoning about system components.
Hardware architecture
Hardware architectures for dataflow was a major topic in Computer architectureComputer architecture
In computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....
research in the 1970s and early 1980s. Jack Dennis
Jack Dennis
Jack Dennis is a computer scientist and retired MIT professor.Dennis entered the Massachusetts Institute of Technology  in 1949 as an electrical engineering major; he received his MS degree in 1954, and continued doctoral research and received his ScD in 1958...
of MIT pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use Content-addressable memory
Content-addressable memory
Content-addressable memory  is a special type of computer memory used in certain very high speed searching applications. It is also known as associative memory, associative storage, or associative array, although the last term is more often used for a programming data structure...
are called dynamic dataflow machines by Arvind
Arvind (computer scientist)
Arvind is the Johnson Professor of Computer Science and Engineering in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology...
. They use tags in memory to facilitate parallelism.
Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).
Concurrency
A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over channels (see message passingMessage passing
Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages  to other processes...
.)
In Kahn process networks
Kahn process networks
Kahn process networks  is a distributed model of computation  where a group of deterministic sequential processes are communicating through unbounded FIFO channels. The resulting process network exhibits deterministic behavior that does not depend on the various computation or communication delays...
, named after Dr. Gilles Kahn
Gilles Kahn
Gilles Kahn  was a French computer scientist. He notably introduced Kahn process networks as a model for parallel processing....
, the processes are determinate. This implies that each determinate process computes a continuous function
Continuous function
In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...
from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using fixpoint theory
Theory
The English word theory was derived from a technical term in Ancient Greek philosophy. The word theoria, , meant "a looking at, viewing, beholding", and referring to contemplation or speculation, as opposed to action...
. The movement and transformation of the data is represented by a series of shapes and lines.
See also
- Data Flow DiagramData flow diagramA data flow diagram is a graphical representation of the "flow" of data through an information system, modelling its process aspects. Often they are a preliminary step used to create an overview of the system which can later be elaborated...
- Flow (disambiguation)
- Dataflow programming
- Lazy evaluationLazy evaluationIn programming language theory, lazy evaluation or call-by-need is an evaluation strategy which delays the evaluation of an expression until the value of this is actually required and which also avoids repeated evaluations...
- Complex event processingComplex Event ProcessingComplex event processing consists of processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time....
- Pure DataPure DataPure Data is a visual programming language developed by Miller Puckette in the 1990s for creating interactive computer music and multimedia works. While Puckette is the main author of the program, Pd is an open source project with a large developer base working on new extensions to it. It is...
- Flow-based programmingFlow-based programmingIn computer science, flow-based programming is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes...
 (FBP)
- Functional reactive programmingFunctional reactive programmingFunctional reactive programming is a programming paradigm for reactive programming using the building blocks of functional programmingThe key points of FRP are:* Input is viewed as a "behavior", or time-varying stream of events...
- Oz programming languageOz (programming language)Oz is a multiparadigm programming language, developed in the Programming Systems Lab at Université catholique de Louvain, for programming language education. It has a canonical textbook: Concepts, Techniques, and Models of Computer Programming....
- Lucid programming language
- Packet flow
- Signal flowSignal flowAudio signal flow is the path an audio signal takes from source to output, including all the processing involved in generating audible sound from electronic impulses or recorded media.- Analog recording :...
- Signal-flow graphSignal-flow graphA signal-flow graph is a special type of block diagram—and directed graph—consisting of nodes and branches. Its nodes are the variables of a set of linear algebraic relations. An SFG can only represent multiplications and additions. Multiplications are represented by the weights of the branches;...
- Data-flow analysisData-flow analysisData-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control flow graph is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The...
External links
-  BMDFM: Binary Modular Dataflow Machine, BMDFMBMDFMBMDFM is software, which enables running an application in parallel on shared memory symmetric multiprocessors using the multiple processors to speed up the execution of single applications....
 .
-  Cantata: Dataflow Visual Language for image processingImage processingIn electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...
 .
-  Cells: Dataflow extension to Common LispCommon LispCommon Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
 Object System, CLOS.
-  Stella: Dataflow Visual Language for dynamic dataflow modelingMathematical modelA mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences and engineering disciplines A mathematical model is a...
 and simulationComputer simulationA computer simulation, a computer model, or a computational model is a computer program, or network of computers, that attempts to simulate an abstract model of a particular system...
 .
- KPASSA : a tool for static-scheduling, performance analysis and optimizations for DataFlow models.
- Liquid Rebol
- NuParadigm : NuParadigm offers a workflow automation and document imaging suite called DataFlow.
- SDF3 : Performance analysis tool for DataFlow Model
- Ruby Dataflow : Ruby gem adding Dataflow variable support
- Acar et al., Adaptive Functional Programming, POPL 2002


