Internet socket
Encyclopedia
In computer networking, an Internet socket or network socket is an endpoint of a bidirectional inter-process
communication flow across an Internet Protocol
-based computer network
, such as the Internet
.
The term Internet sockets is also used as a name for an application programming interface
(API) for the TCP/IP protocol stack
, usually provided by the operating system
. Internet sockets constitute a mechanism for delivering incoming data packets to the appropriate application process
or thread
, based on a combination of local and remote IP address
es and port numbers. Each socket is mapped by the operating system to a communicating application process or thread.
A socket address is the combination of an IP address (the location of the computer) and a port
(which is mapped to the application program process) into a single identity, much like one end of a telephone connection is the combination of a phone number and a particular extension.
Within the operating system and the application that created a socket, the socket is referred to by a unique integer number called socket identifier or socket number. The operating system forwards the payload of incoming IP packets to the corresponding application by extracting the socket address information from the IP and transport protocol headers and stripping the headers from the application data.
In IETF Request for Comments
, Internet Standard
s, in many textbooks, as well as in this article, the term socket refers to an entity that is uniquely identified by the socket number. In other textbooks, the socket term refers to a local socket address, i.e. a "combination of an IP address and a port number". In the original definition of socket given in RFC 147, as it was related to the ARPA network
in 1971, "the socket is specified as a 32 bit number with even sockets identifying receiving sockets and odd sockets identifying sending sockets." Today, however, socket communications are bidirectional.
On Unix-like and Microsoft Windows
based operating systems the netstat
command line tool may be used to list all currently established sockets and related information.
There are also non-Internet sockets, implemented over other transport protocols, such as Systems Network Architecture
(SNA). See also Unix domain socket
s (UDS), for internal inter-process communication.
, and create sockets on start up that are in listening state. These sockets are waiting for initiatives from client
programs. For a listening TCP socket, the remote address presented by netstat may be denoted 0.0.0.0 and the remote port number 0.
A TCP server may serve several clients concurrently, by creating a child process for each client and establishing a TCP connection between the child process and the client. Unique dedicated sockets are created for each connection. These are in established state, when a socket-to-socket virtual connection or virtual circuit (VC), also known as a TCP session
, is established with the remote socket, providing a duplex byte stream
.
Other possible TCP socket states presented by the netstat command are Syn-sent, Syn-Recv, Fin-wait1, Fin-wait2, Time-wait, Close-wait and Closed which relate to various start up and shutdown steps.
A server may create several concurrently established TCP sockets with the same local port number and local IP address, each mapped to its own server-child process, serving its own client process. They are treated as different sockets by the operating system, since the remote socket address (the client IP address and/or port number) are different; i.e. since they have different socket pair tuple
s (see below).
A UDP socket cannot be in an established state, since UDP is connectionless. Therefore, netstat does not show the state of a UDP socket. A UDP server does not create new child processes for every concurrently served client, but the same process handles incoming data packets from all remote clients sequentially through the same socket. This implies that UDP sockets are not identified by the remote address, but only by the local address, although each message has an associated remote address.
library such as Berkeley sockets
, first introduced in 1983. Most implementations are based on Berkeley sockets, for example Winsock
introduced in 1991. Other socket API implementations exist, such as the STREAMS-based Transport Layer Interface
(TLI).
Development of application programs that utilize this API is called socket programming or network programming
.
These are examples of functions or methods typically provided by the API library:
of the Internet model
. Networking equipment such as routers and switch
es do not require implementations of the Transport Layer, as they operate on the Link Layer
level (switches) or at the Internet Layer
(routers). However, stateful network firewalls, network address translators, and proxy servers keep track of active socket pairs. Also in fair queuing, layer 3 switching and quality of service
(QoS) support in routers, packet flows may be identified by extracting information about the socket pairs.
Raw socket
s are typically available in network equipment, and used for routing protocol
s such as IGMP and OSPF, and in Internet Control Message Protocol
(ICMP).
socket API) originated with the 4.2BSD Unix
operating system
(released in 1983) as an API. Only in 1989, however, could UC Berkeley
release versions of its operating system and networking library free from the licensing constraints of AT&T
's copyright-protected Unix.
1987 Transport Layer Interface (TLI)
was the networking API
provided by AT&T
UNIX System V
Release 3 (SVR3) in 1987 and continued into Release 4 (SVR4).
Other early implementations were written for TOPS-20
, MVS
, VM
, IBM-DOS (PCIP)
.
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...
communication flow across an Internet Protocol
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...
-based computer network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
, such as the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
.
The term Internet sockets is also used as a name for an application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
(API) for the TCP/IP protocol stack
Protocol stack
The protocol stack is an implementation of a computer networking protocol suite. The terms are often used interchangeably. Strictly speaking, the suite is the definition of the protocols, and the stack is the software implementation of them....
, usually provided by the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
. Internet sockets constitute a mechanism for delivering incoming data packets to the appropriate application process
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...
or thread
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...
, based on a combination of local and remote IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...
es and port numbers. Each socket is mapped by the operating system to a communicating application process or thread.
A socket address is the combination of an IP address (the location of the computer) and a port
TCP and UDP port
In computer networking, a port is an application-specific or process-specific software construct serving as a communications endpoint in a computer's host operating system. A port is associated with an IP address of the host, as well as the type of protocol used for communication...
(which is mapped to the application program process) into a single identity, much like one end of a telephone connection is the combination of a phone number and a particular extension.
Overview
An Internet socket is characterized by a unique combination of the following:- Local socket address: Local IP address and port number
- Remote socket address: Only for established TCP sockets. As discussed in the Client-Server section below, this is necessary since a TCP server may serve several clients concurrently. The server creates one socket for each client, and these sockets share the same local socket address.
- Protocol: A transport protocol (e.g., TCPTransmission Control ProtocolThe Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...
, UDPUser Datagram ProtocolThe User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...
), raw IPRaw socketIn computer networking, a raw socket is a socket that allows direct sending and receiving of network packets by applications, bypassing all encapsulation in the networking software of the operating system. Most socket application programming interfaces , especially those based on Berkeley sockets,...
, or others. TCP port 53 and UDP port 53 are consequently different, distinct sockets.
Within the operating system and the application that created a socket, the socket is referred to by a unique integer number called socket identifier or socket number. The operating system forwards the payload of incoming IP packets to the corresponding application by extracting the socket address information from the IP and transport protocol headers and stripping the headers from the application data.
In IETF Request for Comments
Request for Comments
In computer network engineering, a Request for Comments is a memorandum published by the Internet Engineering Task Force describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems.Through the Internet Society, engineers and...
, Internet Standard
Internet standard
In computer network engineering, an Internet Standard is a normative specification of a technology or methodology applicable to the Internet. Internet Standards are created and published by the Internet Engineering Task Force .-Overview:...
s, in many textbooks, as well as in this article, the term socket refers to an entity that is uniquely identified by the socket number. In other textbooks, the socket term refers to a local socket address, i.e. a "combination of an IP address and a port number". In the original definition of socket given in RFC 147, as it was related to the ARPA network
ARPANET
The Advanced Research Projects Agency Network , was the world's first operational packet switching network and the core network of a set that came to compose the global Internet...
in 1971, "the socket is specified as a 32 bit number with even sockets identifying receiving sockets and odd sockets identifying sending sockets." Today, however, socket communications are bidirectional.
On Unix-like and Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
based operating systems the netstat
Netstat
netstat is a command-line tool that displays network connections , routing tables, and a number of network interface statistics...
command line tool may be used to list all currently established sockets and related information.
Socket types
There are several Internet socket types available:- Datagram socketDatagram socketA datagram socket is a type of connectionless Internet socket, which is the sending or receiving point for packet delivery services. Each packet sent or received on a datagram socket is individually addressed and routed...
s, also known as connectionless sockets, which use User Datagram ProtocolUser Datagram ProtocolThe User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...
(UDP) - Stream socketStream socketIn computer networking, a stream socket is a type of internet socket which provides a connection-oriented, sequenced, and unduplicated flow of data without record boundaries, with well-defined mechanisms for creating and destroying connections and for detecting errors.This internet socket type...
s, also known as connection-orientedConnection-orientedConnection-oriented communication is a data communication mode in telecommunications whereby the devices at the end points use a protocol to establish an end-to-end logical or physical connection before any data may be sent. In case of digital transmission, in-order delivery of a bit stream or...
sockets, which use Transmission Control ProtocolTransmission Control ProtocolThe Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...
(TCP) or Stream Control Transmission ProtocolStream Control Transmission ProtocolIn computer networking, the Stream Control Transmission Protocol is a Transport Layer protocol, serving in a similar role to the popular protocols Transmission Control Protocol and User Datagram Protocol...
(SCTP). - Raw socketRaw socketIn computer networking, a raw socket is a socket that allows direct sending and receiving of network packets by applications, bypassing all encapsulation in the networking software of the operating system. Most socket application programming interfaces , especially those based on Berkeley sockets,...
s (or Raw IP sockets), typically available in routers and other network equipment. Here the transport layer is bypassed, and the packet headers are not stripped off, but are accessible to the application. Application examples are Internet Control Message ProtocolInternet Control Message ProtocolThe Internet Control Message Protocol is one of the core protocols of the Internet Protocol Suite. It is chiefly used by the operating systems of networked computers to send error messages indicating, for example, that a requested service is not available or that a host or router could not be...
(ICMP, best known for the PingPingPing is a computer network administration utility used to test the reachability of a host on an Internet Protocol network and to measure the round-trip time for messages sent from the originating host to a destination computer...
suboperation), Internet Group Management ProtocolInternet Group Management ProtocolThe Internet Group Management Protocol is a communications protocol used by hosts and adjacent routers on IP networks to establish multicast group memberships....
(IGMP), and Open Shortest Path FirstOpen Shortest Path FirstOpen Shortest Path First is an adaptive routing protocol for Internet Protocol networks. It uses a link state routing algorithm and falls into the group of interior routing protocols, operating within a single autonomous system . It is defined as OSPF Version 2 in RFC 2328 for IPv4...
(OSPF).
There are also non-Internet sockets, implemented over other transport protocols, such as Systems Network Architecture
Systems Network Architecture
Systems Network Architecture is IBM's proprietary networking architecture created in 1974. It is a complete protocol stack for interconnecting computers and their resources. SNA describes the protocol and is, in itself, not actually a program...
(SNA). See also Unix domain socket
Unix domain socket
A Unix domain socket or IPC socket is a data communications endpoint for exchanging data between processes executing within the same host operating system. While similar in functionality to...
s (UDS), for internal inter-process communication.
Socket states and the client-server model
Computer processes that provide application services are called serversServer (computing)
In the context of client-server architecture, a server is a computer program running to serve the requests of other programs, the "clients". Thus, the "server" performs some computational task on behalf of "clients"...
, and create sockets on start up that are in listening state. These sockets are waiting for initiatives from client
Client (computing)
A client is an application or system that accesses a service made available by a server. The server is often on another computer system, in which case the client accesses the service by way of a network....
programs. For a listening TCP socket, the remote address presented by netstat may be denoted 0.0.0.0 and the remote port number 0.
A TCP server may serve several clients concurrently, by creating a child process for each client and establishing a TCP connection between the child process and the client. Unique dedicated sockets are created for each connection. These are in established state, when a socket-to-socket virtual connection or virtual circuit (VC), also known as a TCP session
Session (computer science)
In computer science, in particular networking, a session is a semi-permanent interactive information interchange, also known as a dialogue, a conversation or a meeting, between two or more communicating devices, or between a computer and user . A session is set up or established at a certain point...
, is established with the remote socket, providing a duplex byte stream
Byte stream
In computer science, a byte stream is a bit stream, in which data bits are grouped into units, called bytes.In computer networking the term octet stream is sometimes used to refer to the same thing; it emphasizes the use of bytes having the length of 8 bits, known as octets.Formally, a byte stream...
.
Other possible TCP socket states presented by the netstat command are Syn-sent, Syn-Recv, Fin-wait1, Fin-wait2, Time-wait, Close-wait and Closed which relate to various start up and shutdown steps.
A server may create several concurrently established TCP sockets with the same local port number and local IP address, each mapped to its own server-child process, serving its own client process. They are treated as different sockets by the operating system, since the remote socket address (the client IP address and/or port number) are different; i.e. since they have different socket pair tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...
s (see below).
A UDP socket cannot be in an established state, since UDP is connectionless. Therefore, netstat does not show the state of a UDP socket. A UDP server does not create new child processes for every concurrently served client, but the same process handles incoming data packets from all remote clients sequentially through the same socket. This implies that UDP sockets are not identified by the remote address, but only by the local address, although each message has an associated remote address.
Socket pairs
Communicating local and remote sockets are called socket pairs. Each socket pair is described by a unique 4-tuple consisting of source and destination IP addresses and port numbers, i.e. of local and remote socket addresses. As seen in the discussion above, in the TCP case, each unique socket pair 4-tuple is assigned a socket number, while in the UDP case, each unique local socket address is assigned a socket number.Implementation issues
Sockets are usually implemented by an APIApplication programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
library such as Berkeley sockets
Berkeley sockets
The Berkeley sockets application programming interface comprises a library for developing applications in the C programming language that perform inter-process communication, most commonly for communications across a computer network....
, first introduced in 1983. Most implementations are based on Berkeley sockets, for example Winsock
Winsock
In computing, the Windows Sockets API , which was later shortened to Winsock, is a technical specification that defines how Windows network software should access network services, especially TCP/IP. It defines a standard interface between a Windows TCP/IP client application and the underlying...
introduced in 1991. Other socket API implementations exist, such as the STREAMS-based Transport Layer Interface
Transport Layer Interface
In computer networking, the Transport Layer Interface was the networking API provided by AT&T UNIX System V Release 3 in 1987 and continued into Release 4 . TLI was the System V counterpart to the BSD sockets programming interface, which was also provided in UNIX System V Release 4...
(TLI).
Development of application programs that utilize this API is called socket programming or network programming
Computer network programming
In computing, network programming, essentially identical to socket programming or client–server programming, involves writing computer programs that communicate with other programs across a computer network. The program or process initiating the communication is called a client process, and the...
.
These are examples of functions or methods typically provided by the API library:
-
socket
creates a new socket of a certain socket type, identified by an integer number, and allocates system resources to it. -
bind
is typically used on the server side, and associates a socket with a socket address structure, i.e. a specified local port number and IP address. -
listen
is used on the server side, and causes a bound TCP socket to enter listening state. -
connect
is used on the client side, and assigns a free local port number to a socket. In case of a TCP socket, it causes an attempt to establish a new TCP connection. -
accept
is used on the server side. It accepts a received incoming attempt to create a new TCP connection from the remote client, and creates a new socket associated with the socket address pair of this connection. -
send
andrecv
, orwrite
andread
, orrecvfrom
andsendto
, are used for sending and receiving data to/from a remote socket. -
close
causes the system to release resources allocated to a socket. In case of TCP, the connection is terminated. -
gethostbyname
andgethostbyaddr
are used to resolve host names and addresses. -
select
is used to prune a provided list of sockets for those that are ready to read, ready to write or have errors. -
poll
is used to check on the state of a socket. The socket can be tested to see if it can be written to, read from or has errors. -
epoll
is used to monitor the state of several sockets. Only sockets whose state have changed are added to a list byepoll_wait
.
Sockets in network equipment
The socket is primarily a concept used in the Transport LayerTransport layer
In computer networking, the transport layer or layer 4 provides end-to-end communication services for applications within a layered architecture of network components and protocols...
of the Internet model
Internet protocol suite
The Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...
. Networking equipment such as routers and switch
Network switch
A network switch or switching hub is a computer networking device that connects network segments.The term commonly refers to a multi-port network bridge that processes and routes data at the data link layer of the OSI model...
es do not require implementations of the Transport Layer, as they operate on the Link Layer
Link Layer
In computer networking, the link layer is the lowest layer in the Internet Protocol Suite , the networking architecture of the Internet . It is the group of methods or protocols that only operate on a host's link...
level (switches) or at the Internet Layer
Internet layer
The internet layer or IP layer is a group of internetworking methods in the Internet protocol suite, commonly also called TCP/IP, which is the foundation of the Internet...
(routers). However, stateful network firewalls, network address translators, and proxy servers keep track of active socket pairs. Also in fair queuing, layer 3 switching and quality of service
Quality of service
The quality of service refers to several related aspects of telephony and computer networks that allow the transport of traffic with special requirements...
(QoS) support in routers, packet flows may be identified by extracting information about the socket pairs.
Raw socket
Raw socket
In computer networking, a raw socket is a socket that allows direct sending and receiving of network packets by applications, bypassing all encapsulation in the networking software of the operating system. Most socket application programming interfaces , especially those based on Berkeley sockets,...
s are typically available in network equipment, and used for routing protocol
Routing protocol
A routing protocol is a protocol that specifies how routers communicate with each other, disseminating information that enables them to select routes between any two nodes on a computer network, the choice of the route being done by routing algorithms. Each router has a priori knowledge only of...
s such as IGMP and OSPF, and in Internet Control Message Protocol
Internet Control Message Protocol
The Internet Control Message Protocol is one of the core protocols of the Internet Protocol Suite. It is chiefly used by the operating systems of networked computers to send error messages indicating, for example, that a requested service is not available or that a host or router could not be...
(ICMP).
Early implementations
1983 Berkeley sockets (also known as the BSDBerkeley Software Distribution
Berkeley Software Distribution is a Unix operating system derivative developed and distributed by the Computer Systems Research Group of the University of California, Berkeley, from 1977 to 1995...
socket API) originated with the 4.2BSD Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
(released in 1983) as an API. Only in 1989, however, could UC Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...
release versions of its operating system and networking library free from the licensing constraints of AT&T
AT&T
AT&T Inc. is an American multinational telecommunications corporation headquartered in Whitacre Tower, Dallas, Texas, United States. It is the largest provider of mobile telephony and fixed telephony in the United States, and is also a provider of broadband and subscription television services...
's copyright-protected Unix.
1987 Transport Layer Interface (TLI)
Transport Layer Interface
In computer networking, the Transport Layer Interface was the networking API provided by AT&T UNIX System V Release 3 in 1987 and continued into Release 4 . TLI was the System V counterpart to the BSD sockets programming interface, which was also provided in UNIX System V Release 4...
was the networking API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
provided by AT&T
AT&T
AT&T Inc. is an American multinational telecommunications corporation headquartered in Whitacre Tower, Dallas, Texas, United States. It is the largest provider of mobile telephony and fixed telephony in the United States, and is also a provider of broadband and subscription television services...
UNIX System V
UNIX System V
Unix System V, commonly abbreviated SysV , is one of the first commercial versions of the Unix operating system. It was originally developed by American Telephone & Telegraph and first released in 1983. Four major versions of System V were released, termed Releases 1, 2, 3 and 4...
Release 3 (SVR3) in 1987 and continued into Release 4 (SVR4).
Other early implementations were written for TOPS-20
TOPS-20
The TOPS-20 operating system by Digital Equipment Corporation was the second proprietary OS for the PDP-10 mainframe computer. TOPS-20 began in 1969 as the TENEX operating system of Bolt, Beranek and Newman...
, MVS
MVS
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers...
, VM
VM (operating system)
VM refers to a family of IBM virtual machine operating systems used on IBM mainframes System/370, System/390, zSeries, System z and compatible systems, including the Hercules emulator for personal computers. The first version, released in 1972, was VM/370, or officially Virtual Machine Facility/370...
, IBM-DOS (PCIP)
.
See also
- SOCKSSOCKSSOCKS is an Internet protocol that routes network packets between a client and server through a proxy server. SOCKS5 additionally provides authentication so only authorized users may access a server...
- Internet ProtocolInternet ProtocolThe Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...
- Internet protocol suiteInternet protocol suiteThe Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...
- Packet
- Raw socketRaw socketIn computer networking, a raw socket is a socket that allows direct sending and receiving of network packets by applications, bypassing all encapsulation in the networking software of the operating system. Most socket application programming interfaces , especially those based on Berkeley sockets,...
- TCP and UDP port numbers
- Unix domain socketUnix domain socketA Unix domain socket or IPC socket is a data communications endpoint for exchanging data between processes executing within the same host operating system. While similar in functionality to...
for a similar abstraction for local communication - Named pipeNamed pipeIn computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. The concept is also found in Microsoft Windows, although the semantics differ substantially...
for one-way communication