RDMA over Converged Ethernet
Encyclopedia
RDMA over Converged Ethernet (RoCE) is a network protocol that allows remote direct memory access over an Ethernet network. RoCE is a link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain
Broadcast domain
A broadcast domain is a logical division of a computer network, in which all nodes can reach each other by broadcast at the data link layer. A broadcast domain can be within the same LAN segment or it can be bridged to other LAN segments....

. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network
Data center bridging
Data center bridging refers to a set of enhancements to Ethernet local area networks for use in data center environments. Specifically, DCB goals are, for selected traffic, to eliminate loss due to queue overflow and to be able to allocate bandwidth on links. Essentially, DCB enables, to some...

, the protocol can also be used on a traditional or non-converged Ethernet network.

Background information

Network-intensive applications like networked storage or cluster computing need a network infrastructure with a high bandwidth and low latency. The advantages of RDMA over other network APIs like the Berkely socket API
Berkeley sockets
The Berkeley sockets application programming interface comprises a library for developing applications in the C programming language that perform inter-process communication, most commonly for communications across a computer network....

 are lower latency, lower CPU load and higher bandwidth. The RoCE protocol allows lower latencies than its predecessor, the iWARP
IWARP
The Internet Wide Area RDMA Protocol is a computer networking protocol for transferring data efficiently.It is sometimes referred to simply as "RDMA", though RDMA is not a feature exclusive to iWARP.-History:...

 protocol. There exist RoCE HCAs with a latency as low as 1.3 microseconds while the lowest known iWARP HCA latency today is 2 microseconds.

RoCE versus InfiniBand

RoCE defines how to perform RDMA over Ethernet while the InfiniBand
InfiniBand
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

 Architecture Specification defines how to perform RDMA over an InfiniBand network. RoCE is expected to bring InfiniBand applications which are predominantly based on clusters on to a common Ethernet converged fabric. Others expect that InfiniBand will keep offering a higher bandwidth and lower latency than what is possible with RoCE. While Ethernet is a more familiar technology to most than InfiniBand, the cost of InfiniBand equipment, especially switches, is lower than that of 10 GbE equipment. Another difference between the Ethernet and InfiniBand technologies is that InfiniBand networks are more energy efficient.

RoCE versus iWARP

While the RoCE specification defines how to perform RDMA over the Ethernet link layer, iWARP
IWARP
The Internet Wide Area RDMA Protocol is a computer networking protocol for transferring data efficiently.It is sometimes referred to simply as "RDMA", though RDMA is not a feature exclusive to iWARP.-History:...

 is a standard that defines how to perform RDMA over a connection-oriented transport like TCP. That means that unlike RoCE, iWARP is neither bound to Ethernet nor limited to a single Ethernet broadcast domain. However, the memory requirements of many connections along with TCP's flow and reliability controls lead to scalability and performance issues for large-scale HPC and datacenter applications. Also, multicast is defined in the RoCE specification while the current iWARP specification does not define how to perform multicast RDMA.

Criticism

Some aspects that should have been defined in the RoCE specification but have been left out. These are:
  • How to translate between primary RoCE GIDs and Ethernet MAC addresses.
  • How to translate between secondary RoCE GIDs and Ethernet MAC addresses. It is not clear whether it is possible to implement secondary GIDs in the RoCE protocol without adding a RoCE-specific address resolution protocol.
  • How to implement VLANs for the RoCE protocol. Current implementations store the VLAN ID in the twelfth and thirteenth byte of the sixteen-byte GID, although the RoCE specification does not mention VLANs at all.
  • How to translate between RoCE multicast GIDs and Ethernet MAC addresses. Current implementations use the same address mapping that has been specified for mapping IPv6 multicast addresses to Ethernet MAC addresses. This is dangerous though because on a network where MLD has been enabled in Ethernet switches MLD, if a RoCE and an IPv6 multicast address map to the same Ethernet address, MLD snooping may cause the RoCE traffic not to be sent out over all switch ports it should be sent out.

See also

  • Data center bridging
    Data center bridging
    Data center bridging refers to a set of enhancements to Ethernet local area networks for use in data center environments. Specifically, DCB goals are, for selected traffic, to eliminate loss due to queue overflow and to be able to allocate bandwidth on links. Essentially, DCB enables, to some...

     (DCB), sometimes called Converged Ethernet or Converged Enhanced Ethernet.
  • Remote direct memory access
    Remote Direct Memory Access
    In computing, remote direct memory access is a direct memory access from the memory of one computer into that of another without involving either one's operating system...

     (RDMA).
  • InfiniBand
    InfiniBand
    InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

    .
  • iWARP
    IWARP
    The Internet Wide Area RDMA Protocol is a computer networking protocol for transferring data efficiently.It is sometimes referred to simply as "RDMA", though RDMA is not a feature exclusive to iWARP.-History:...

    .
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK