Distributed data flow
Encyclopedia
Distributed data flow refers to a set of events
Event (computing)
In computing an event is an action that is usually initiated outside the scope of a program and that is handled by a piece of code inside the program. Typically events are handled synchronous with the program flow, that is, the program has one or more dedicated places where events are handled...

 in a distributed application
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 or protocol
Communications protocol
A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications...

 that satisfies the following informal properties:
  • Asynchronous, non-blocking, and one-way. Each event represents a single instance of a non-blocking
    Non-blocking synchronization
    In computer science, a non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion...

    , one-way, asynchronous method invocation
    Asynchronous method invocation
    In object-oriented programming, asynchronous method invocation , also known as asynchronous method calls or asynchronous pattern is a design pattern for asynchronous invocation of potentially long-running methods of an object.It is equivalent to the IOU pattern described in 1996 by Allan...

     or other form of explicit or implicit message passing
    Message passing
    Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...

     between two layers or software components. For example, each event might represent a single request to multicast
    Multicast
    In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source creating copies automatically in other network elements, such as routers, only when the topology of the network requires...

     a packet, issued by an application layer
    Application layer
    The Internet protocol suite and the Open Systems Interconnection model of computer networking each specify a group of protocols and methods identified by the name application layer....

     to an underlying multicast protocol
    Reliable multicast
    A reliable multicast protocol is a computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications like multi-receiver file transfer or streaming media.-Overview:...

    . The requirement that events are one-way and asynchronous is important. Invocations of methods that may return results would normally be represented as two separate flows: one flow that represents the requests, and another flow that represents responses.

  • Homogeneous, unidirectional, and uniform. All events in the distributed flow serve the same functional and logical purpose, and are related to one-another; generally, we require that they represent method calls or message exchanges between instances
    Object (computer science)
    In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...

     of the same functional layers
    Abstraction layer
    An abstraction layer is a way of hiding the implementation details of a particular set of functionality...

    , or instances of the same components
    Component-based software engineering
    Component-based software engineering is a branch of software engineering that emphasizes the separation of concerns in respect of the wide-ranging functionality available throughout a given software system...

    , but perhaps on different nodes
    Node (networking)
    In communication networks, a node is a connection point, either a redistribution point or a communication endpoint . The definition of a node depends on the network and protocol layer referred to...

     within a computer network
    Computer network
    A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

    . Furthermore, all events must flow in the same direction (i.e., one type of a layer or component always produces, and the other always consumes the events), and carry the same type of a payload. For example, a set of events that includes all multicast requests issued by the same application layer to the same multicast protocol is a distributed flow. On the other hand, a set of events that includes multicast requests made by different applications to different multicast protocols would not be considered a distributed flow, and neither would be a set of events that represent multicast requests as well as acknowledgments and error notifications.

  • Concurrent, continuous, and distributed. The flow usually includes all events that flow between the two layers of software, simultaneously at different locations, and over a finite or infinite period of time. Thus, in general, events in a distributed flow are distributed both in space (they occur at different nodes) and in time (they occur at different times). For example, the flow of multicast requests would include all such requests made by instances of the given application on different nodes; normally, such flow would include events that occur on all nodes participating in the given multicast protocol. A flow, in which all events occur at the same node would be considered degenerate.


Formally, we represent each event in a distributed flow as a quadruple of the form (x,t,k,v), where x is the location (e.g., the network address of a physical node) at which the event occurs, t is the time at which this happens, k is a version, or a sequence number identifying the particular event, and v is a value that represents the event payload (e.g., all the arguments passed in a method call). Each distributed flow is a (possibly infinite) set of such quadruples that satisfies the following three formal properties.
  • For any finite point in time t, there can be only finitely many events in the flow that occur at time t or earlier. This implies that in which flow, one can always point to the point in time at which the flow originated. The flow itself can be infinite; in such case, at any point in time, eventually a new event will appear in the flow.

  • For any pair of events e_1 and e_2 that occur at the same location, if e_1 occurs at an earlier time than e_2, then the version number in e_1 must also be smaller than that of e_2.

  • For any pair of events e_1 and e_2 that occur at the same location, if the two events have the same version numbers, they must also have the same values.


In addition to the above, flows can have a number of additional properties.
  • Consistency. A distributed flow is said to be consistent if events with the same version always have the same value, even if they occur at different locations. Consistent flows typically represent various sorts of global decisions made by the protocol or application.

  • Monotonicity. A distributed flow is said to be weakly monotonic if for any pair of events e_1 and e_2 that occur at the same location, if e_1 has a smaller version than e_2, then e_1 must carry a smaller value than e_2. A distributed flow is said to be strongly monotonic (or simply monotonic) if this is true even for pairs of events e_1 and e_2 that occur at different locations. Strongly monotonic flows are always consistent. They typically represent various sorts of irreversible decisions. Weakly monotonic flows may or may not be consistent.


Distributed data flows serve a purpose analogous to variables
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...

 or method parameters
Parameter (computer science)
In computer programming, a parameter is a special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are called arguments...

 in programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s such as Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, in that they can represent state that is stored or communicated by a layer of software. Unlike variables or parameters, which represent a unit of state that resides in a single location, distributed flows are dynamic and distributed: they simultaneously appear in multiple locations within the network at the same time. As such, distributed flows are a more natural way of modeling the semantics and inner workings of certain classes of distributed systems. In particular, the distributed data flow abstraction has been used as a convenient way of expressing the high-level logical relationships between parts of distributed protocols
.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK