Remote Direct Memory Access - AbsoluteAstronomy.com

Computing

Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, remote direct memory access (RDMA) is a direct memory access

Direct memory access

Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

from the memory of one computer into that of another without involving either one's operating system

Operating system

An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

. This permits high-throughput, low-latency

Latency (engineering)

Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...

networking, which is especially useful in massively parallel computer clusters.

RDMA supports zero-copy

Zero-copy

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another. This is most often used to save on processing power and memory use when sending files over a network.- Principle :...

networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs

Central processing unit

The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

, cache

Cache

In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

s, or context switch

Context switch

A context switch is the computing process of storing and restoring the state of a CPU so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU. The context switch is an essential feature of a multitasking operating system...

es, and transfers continue in parallel with other system operations. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer.

This strategy presents several problems related to the fact that the target node is not notified of the completion of the request (1-sided communications). The common way to notify it is to change a memory byte when the data has been delivered, but it requires the target to poll on this byte. Not only does this polling consume CPU cycles, but the memory footprint and the latency increases linearly with the number of possible other nodes, which limits use of RDMA in High-Performance Computing (HPC)

High-performance computing

High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...

.

RDMA reduces network protocol overhead, leading to improvements in communication latency. Reductions in protocol overhead can increase a network's ability to move data quickly

Network performance

Network performance refers to the service quality of a telecommunications product as seen by the customer. It should not be seen merely as an attempt to get "more through" the network....

, allowing applications to get the data they need faster, in turn leading to more scalable clusters. However, one must be aware of the tradeoff between this reduction in network protocol overhead and additional overhead that may be incurred on each node due to the need for pinning virtual memory pages. In particular, zero-copy

Zero-copy

RDMA protocols require that the memory pages involved in a transaction be pinned, at least for the duration of the transfer. If this is not done, RDMA pages might be paged out to disk

Paging

In computer operating systems, paging is one of the memory-management schemes by which a computer can store and retrieve data from secondary storage for use in main memory. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called...

and replaced with other data by the operating system, causing the DMA

DMA

DMA can refer to:* DMA , a defunct dance music magazine* Dallas Museum of Art, an art museum in Texas, USA* Danish Music Awards, an award show held in Denmark since 1989...

engine (which knows nothing of the virtual memory system maintained by the operating system) to send the wrong data. The net result of not pinning the pages in a zero-copy RDMA system can be corruption of the contents of memory in the distributed system. Pinning memory takes time and additional memory to setup, reduces the quantity of memory the operating system can allocate to processes, limits the overall flexibility of the memory system to adapt over time, and could even lead to underutilization of memory if processes unnecessarily pin pages. The net result is the introduction of latency, sometimes in linear proportion to the number of pages of data pinned in memory. In order to mitigate these problems, several techniques for interfacing with RDMA devices were developed:

using caching techniques to keep data pinned as long as possible, producing overhead reductions for applications that repeatedly communicate in the same memory area
pipelining memory pinning operations and data transfer (as done on Infiniband
InfiniBand
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

or Myrinet
Myrinet
Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by Myricom to be used as an interconnect between multiple machines to form computer clusters. Myrinet has much lower protocol overhead than standards such as Ethernet, and therefore provides better throughput, less...

)
deferring memory pinning out of the critical path, thus hiding the latency increase
entirely removing the need for pinning (as Quadrics
Quadrics
Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of...

does)

In contrast, the Send/Recv model used by other zero-copy

Zero-copy

HPC interconnects, such as Myrinet

Myrinet

Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by Myricom to be used as an interconnect between multiple machines to form computer clusters. Myrinet has much lower protocol overhead than standards such as Ethernet, and therefore provides better throughput, less...

or Quadrics

Quadrics

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of...

, does not have the 1-sided communication problem or the memory paging problem described above while at the same time managing to provide comparable reductions in latency when used in conjunction with HPC communication frameworks that expose the Send/Recv model to the programmer (such as MPI

Message Passing Interface

Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...

).

Much like other HPC interconnects, RDMA’s acceptance is currently limited by the need to install a different networking infrastructure. However, new standards enable Ethernet

Ethernet

Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

RDMA implementation at the physical layer and TCP

Transmission Control Protocol

The Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...

/IP

Internet Protocol

The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution. The RDMA Consortium and the DAT Collaborative have played key roles in the development of RDMA protocols and APIs

Application programming interface

An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

for consideration by standards groups such as the Internet Engineering Task Force

Internet Engineering Task Force

The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...

and the Interconnect Software Consortium. Software vendors such as Red Hat

Red Hat

Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....

and Oracle Corporation

Oracle Corporation

Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...

support these APIs in their latest products, and network adapters that implement RDMA over Ethernet are being developed. Both Red Hat Enterprise Linux and Red Hat Enterprise MRG have support for RDMA.

Common RDMA implementations include the Virtual Interface Architecture

Virtual Interface Architecture

The Virtual Interface Architecture is an abstract model of a user-level zero-copy network, and is the basis for InfiniBand, iWARP and RoCE...

, InfiniBand

InfiniBand

InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

, and iWARP

IWARP

The Internet Wide Area RDMA Protocol is a computer networking protocol for transferring data efficiently.It is sometimes referred to simply as "RDMA", though RDMA is not a feature exclusive to iWARP.-History:...

External links

RDMA Consortium
InfiniBand Trade Association
A Tutorial of the RDMA Model
RDMA usage
A Critique of RDMA for High-Performance Computing

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.