Socket UDP one socket, different ports - sockets

I'm new to socket programing, so forgive me if this question is basic; I couldn't find an answer anywhere.
What constitutes requiring a new socket?
For example, it appears possible to send and receive with the same socket fd on the same port. Could you send on port XXXX and receive on port YYYY with one socket? If not, then are sockets specific to host/port combinations?
Thanks for the insight!

A socket establishes an "endpoint", which consists of an IP address and a port:
http://www.gsp.com/cgi-bin/man.cgi?topic=socket
Yes, a single socket is specific to a single host/port combination.
READING RECOMMENDATION:
Beej's Guide to Network Programming:
http://beej.us/guide/bgnet/
Unix Network Programming: Stevens et al:
http://www.amazon.com/Unix-Network-Programming-Volume-Networking/dp/0131411551

Port number is a local property and helps identify a socket endpoint for an incoming data destined for that port on the receivers machine. Each machine has 64K ports for each protocol type (TCP or UDP) and for each family type (IPv4 or IPv6).
With UDP, it is possible to send to (and receive from) many clients sitting on different ports. So, for the following connection:
UDP_socketA (port p1) <---------> UDP socketB (port p2)
|
|
|
|
UDP socketC (port p3)
Thus, socketA can send datat to socketB and socketC even though they are sitting on different port numbers. The way it works is that with UDP sockets, we typically use sendto() API that allows us to specify an IP address and a port number for each packet. Thus, we can send one packet to port B and the next packet to port C and what not.
With TCP, however this is not the case. Once a connection is established, let us say between socketA and socketB, then there is no way, either of these sockets can talk to socketC
TCP_socketA (port p1) <---------> TCP socket B (port p2)
|
|
|
|
TCP socketC (port p3)

Related

How does a TCP server handle multiple Sockets to listening to the same port? [duplicate]

This might be a very basic question but it confuses me.
Can two different connected sockets share a port? I'm writing an application server that should be able to handle more than 100k concurrent connections, and we know that the number of ports available on a system is around 60k (16bit). A connected socket is assigned to a new (dedicated) port, so it means that the number of concurrent connections is limited by the number of ports, unless multiple sockets can share the same port. So the question.
TCP / HTTP Listening On Ports: How Can Many Users Share the Same Port
So, what happens when a server listen for incoming connections on a TCP port? For example, let's say you have a web-server on port 80. Let's assume that your computer has the public IP address of 24.14.181.229 and the person that tries to connect to you has IP address 10.1.2.3. This person can connect to you by opening a TCP socket to 24.14.181.229:80. Simple enough.
Intuitively (and wrongly), most people assume that it looks something like this:
Local Computer | Remote Computer
--------------------------------
<local_ip>:80 | <foreign_ip>:80
^^ not actually what happens, but this is the conceptual model a lot of people have in mind.
This is intuitive, because from the standpoint of the client, he has an IP address, and connects to a server at IP:PORT. Since the client connects to port 80, then his port must be 80 too? This is a sensible thing to think, but actually not what happens. If that were to be correct, we could only serve one user per foreign IP address. Once a remote computer connects, then he would hog the port 80 to port 80 connection, and no one else could connect.
Three things must be understood:
1.) On a server, a process is listening on a port. Once it gets a connection, it hands it off to another thread. The communication never hogs the listening port.
2.) Connections are uniquely identified by the OS by the following 5-tuple: (local-IP, local-port, remote-IP, remote-port, protocol). If any element in the tuple is different, then this is a completely independent connection.
3.) When a client connects to a server, it picks a random, unused high-order source port. This way, a single client can have up to ~64k connections to the server for the same destination port.
So, this is really what gets created when a client connects to a server:
Local Computer | Remote Computer | Role
-----------------------------------------------------------
0.0.0.0:80 | <none> | LISTENING
127.0.0.1:80 | 10.1.2.3:<random_port> | ESTABLISHED
Looking at What Actually Happens
First, let's use netstat to see what is happening on this computer. We will use port 500 instead of 80 (because a whole bunch of stuff is happening on port 80 as it is a common port, but functionally it does not make a difference).
netstat -atnp | grep -i ":500 "
As expected, the output is blank. Now let's start a web server:
sudo python3 -m http.server 500
Now, here is the output of running netstat again:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
So now there is one process that is actively listening (State: LISTEN) on port 500. The local address is 0.0.0.0, which is code for "listening for all ip addresses". An easy mistake to make is to only listen on port 127.0.0.1, which will only accept connections from the current computer. So this is not a connection, this just means that a process requested to bind() to port IP, and that process is responsible for handling all connections to that port. This hints to the limitation that there can only be one process per computer listening on a port (there are ways to get around that using multiplexing, but this is a much more complicated topic). If a web-server is listening on port 80, it cannot share that port with other web-servers.
So now, let's connect a user to our machine:
quicknet -m tcp -t localhost:500 -p Test payload.
This is a simple script (https://github.com/grokit/quickweb) that opens a TCP socket, sends the payload ("Test payload." in this case), waits a few seconds and disconnects. Doing netstat again while this is happening displays the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:54240 ESTABLISHED -
If you connect with another client and do netstat again, you will see the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:26813 ESTABLISHED -
... that is, the client used another random port for the connection. So there is never confusion between the IP addresses.
A server socket listens on a single port. All established client connections on that server are associated with that same listening port on the server side of the connection. An established connection is uniquely identified by the combination of client-side and server-side IP/Port pairs. Multiple connections on the same server can share the same server-side IP/Port pair as long as they are associated with different client-side IP/Port pairs, and the server would be able to handle as many clients as available system resources allow it to.
On the client-side, it is common practice for new outbound connections to use a random client-side port, in which case it is possible to run out of available ports if you make a lot of connections in a short amount of time.
A connected socket is assigned to a new (dedicated) port
That's a common intuition, but it's incorrect. A connected socket is not assigned to a new/dedicated port. The only actual constraint that the TCP stack must satisfy is that the tuple of (local_address, local_port, remote_address, remote_port) must be unique for each socket connection. Thus the server can have many TCP sockets using the same local port, as long as each of the sockets on the port is connected to a different remote location.
See the "Socket Pair" paragraph in the book "UNIX Network Programming: The sockets networking API" by
W. Richard Stevens, Bill Fenner, Andrew M. Rudoff at: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA52&dq=socket%20pair%20tuple&pg=PA52#v=onepage&q=socket%20pair%20tuple&f=false
Theoretically, yes. Practice, not. Most kernels (incl. linux) doesn't allow you a second bind() to an already allocated port. It weren't a really big patch to make this allowed.
Conceptionally, we should differentiate between socket and port. Sockets are bidirectional communication endpoints, i.e. "things" where we can send and receive bytes. It is a conceptional thing, there is no such field in a packet header named "socket".
Port is an identifier which is capable to identify a socket. In case of the TCP, a port is a 16 bit integer, but there are other protocols as well (for example, on unix sockets, a "port" is essentially a string).
The main problem is the following: if an incoming packet arrives, the kernel can identify its socket by its destination port number. It is a most common way, but it is not the only possibility:
Sockets can be identified by the destination IP of the incoming packets. This is the case, for example, if we have a server using two IPs simultanously. Then we can run, for example, different webservers on the same ports, but on the different IPs.
Sockets can be identified by their source port and ip as well. This is the case in many load balancing configurations.
Because you are working on an application server, it will be able to do that.
I guess none of the answers tells every detail of the process, so here it goes:
Consider an HTTP server:
It asks the OS to bind the port 80 to one or many IP addresses (if you choose 127.0.0.1, only local connections are accepted. You can choose 0.0.0.0 to bind to all IP addresses (localhost, local network, wide area network, both IP versions)).
When a client connects to that port, it WILL lock it up for a while (that's why the socket has a backlog: it queues a number of connection attempts, because they ARE NOT instantaneous).
The OS then chooses a random port and transfer that connection to that port (think of it as a temporary port that will handle all the traffic from now on).
The port 80 is then released for the next connection (first, it will accept the first one in the backlog).
When client or server disconnects, the random port is held open for a while (CLOSE_WAIT in the remote side, TIME_WAIT in the local side). That allows flushing some lost packets along the path. The default time for that state is 2 * MSL seconds (and it WILL consume memory while is waiting).
After that waiting, that random port is free again to receive other connections.
So, TCP cannot even share a port amongst two IP's!
No. It is not possible to share the same port at a particular instant. But you can make your application such a way that it will make the port access at different instant.
Absolutely not, because even multiple connections may shave same ports but they'll have different IP addresses

On successful TCP connection between server and client

RELATED POST
The post here In UNIX forum describes
The server will keep on listeninig on a port number.
The server will accept a clients connect() request using accept(). As soon as the server accepts the client request, the kernel allocates a random port number for the server for further send() and receive(), since the same port number on the server can't be used for sending as well as listening, and the previous port is still listening for new connections
QUESTION
I have a server application S which is constantly listening on port 18333 (this is actually bitcoind testnet). When another client node C connects with it on say 53446 (random port). According to the above post, S will be able to send/receive data of 'C' only from port 53446.
But when I run a bitcoind testnet. This perfectly communicates with other node with only one socket connection in port 18333 without need for another for sending/receiving. Below is snippet and I even verified this
bitcoin-cli -testnet -rpcport=16591 -datadir=/home/user/mytest/1/
{
"id": 1,
"addr": "178.32.61.149:18333"
}
Can anyone help me understand what is the right working in TCP socket connection?
A TCP connection is identified by a socket pair and this is uniquely identified by 4 parameters :
source ip
source port
dest ip
dest port
For every connection that is established to a server the socket is basically cloned and the same port is being used. So for every connection you have a socket using the same server port. So you have n+1 socket using the same port when there are n connections.
The TCP kernel is able to make distinction between all these sockets and connections because the socket is either in the listening state, or it belongs to the socket pair where all 4 parameters are considered.
Your second bullet is therefore wrong because the same port is being used as i explained above.
The server will accept a clients connect() request using accept(). As
soon as the server accepts the client request, the kernel allocates a
random port number for the server for further send() and receive().
On normal TCP traffic this is not the case. If a webserver is listening on port 80, all packets sent back to the client wil be over server port 80 (this can be verified with WireShark for example) - but there will be a different socket for each connection (srcIP:port - dstIP:port). That information is sent in the headers of the network packets - IP and protocol code (TCP, UDP or other) in the IP header, port numbers as part of the TCP or UDP header).
But changing ports can happen when communicating over ftp, where there can be a control port (ususally 21) and a negotiated data port.

Transmission Control Protocol socket

When I open TCP with the server (on 7 layer of OSI), the layer 5 create socket with port number and IP.
I want to know if this socket include my IP/the server IP, and my (random) port or the server port (e.g. 80 for HTTP)
And when I open TCP with server we open TCP together
So it's mean we have common socket?
When I open TCP with the server (on 7 layer of OSI)
Forget about OSI. It is obsolete, and TCP/IP doesn't follow it. It has its own layer model.
The layer 5 create socket with port number and IP
TCP creates it at the TCP layer.
I want to know if this socket include my IP/the server IP, and my (random) port or the server port (80 for HTTP for ex.)
All of the above.
And when I open TCP with server we open TCP together So it's mean we have common socket?
No. A socket is an endpoint of a connection. There are two ends, and two sockets.
TCP is a Layer 4 as it is called - or a Transport Layer, so ignore the OSI model for the time being.
Generally - 'a socket' is just an end point without any identity. The socket gets it's identity when you bind to an address or connect to an address.
When you bind to an address - you only get your local port and local IP address in it's end point, but not the remote IP and port address. As such such socket is not very useful unless you listen on it. This is typically done on the server. Also note that you can bind to 'All Addresses on the machine' and then you really don't have any one end-point per se.
When you connect to a server (a TCP server # port 80 say), your OS TCP/IP stack makes use of a local IP address and chooses a random port to connect to a sever socket (like say one listening above). This is when all the 4 addresses come into picture. This socket is a connected socket and all 4 values will be present.

How TCP/UDP demultiplexing works?

I have the following statement.
"In TCP, the receiver host uses all of source IP, source port, destination IP and destination port to direct datagram to appropriate socket. While in UDP, the receiver only checks destination port number to direct the datagram. "
Is the above statement true?
If yes, does it mean that in TCP the same port can be used for multiple socket in one process, while in UDP only one socket can use on a port in one process? What about sockets in different processes? Can multiple processes use the same port in TCP/UDP? (in programming language: C/C++/Java)
If not, why?
"In TCP, the receiver host uses all of source IP, source port, destination IP and destination port to direct datagram to appropriate socket. While in UDP, the receiver only checks destination port number to direct the datagram. "
Is the above statement true?
Yes.
If yes, does it mean that in TCP the same port can be used for multiple socket in one process,
Yes, under some circumstances.
while in UDP only one socket can use on a port in one process?
No, see below.
What about sockets in different processes? Can multiple processes use the same port in TCP/UDP? (in programming language: C/C++/Java)
Under some circumstances, yes. A UDP port has to be designated as reusable by all processes that want to share it. A TCP port can only be reused by sockets bound to different interfaces: there is no sharing.
What that means is, in TCP, a unique communication "channel" can be described as the four-tuple: (src-ip, src-port, dst-ip, dst-port).
In UDP, all packets destined to a certain port are delivered to the only UDP socket listening on that port, regardless of the source address and port of said packet. I like to think of it as a funnel.

What does it mean to bind a multicast (UDP) socket?

I am using multicast UDP between hosts that have multiple network interfaces.
I am using boost::asio, and am confused by the 2 operations receivers have to make: bind, then join-group.
Why do you need to specify the local address of an interface, during bind, when you do that with every multicast group that you join?
The sister-question regards the multicast port: Since during sending, you send to a multicast address & port, why, during subscription to a multicast group, you only specify the address, not the port - the port being specified in the confusing call to bind.
Note: the "join-group" is a wrapper over setsockopt(IP_ADD_MEMBERSHIP), which as documented, may be called multiple times on the same socket to subscribe to different groups (over different networks?). It would therefore make perfect sense to ditch the bind call and specify the port every time I subscribe to a group.
From what I see, always binding to "0.0.0.0" and specifying the interface address when joining the group, works very well. Confused.
To bind a UDP socket when receiving multicast means to specify an address and port from which to receive data (NOT a local interface, as is the case for TCP acceptor bind). The address specified in this case has a filtering role, i.e. the socket will only receive datagrams sent to that multicast address & port, no matter what groups are subsequently joined by the socket. This explains why when binding to INADDR_ANY (0.0.0.0) I received datagrams sent to my multicast group, whereas when binding to any of the local interfaces I did not receive anything, even though the datagrams were being sent on the network to which that interface corresponded.
Quoting from UNIX® Network Programming Volume 1, Third Edition: The Sockets Networking API by W.R Stevens.
21.10. Sending and Receiving
[...] We want the receiving socket to bind the multicast group and
port, say 239.255.1.2 port 8888. (Recall that we could just bind the
wildcard IP address and port 8888, but binding the multicast address
prevents the socket from receiving any other datagrams that might
arrive destined for port 8888.) We then want the receiving socket to
join the multicast group. The sending socket will send datagrams to
this same multicast address and port, say 239.255.1.2 port 8888.
The "bind" operation is basically saying, "use this local UDP port for sending and receiving data. In other words, it allocates that UDP port for exclusive use for your application. (Same holds true for TCP sockets).
When you bind to "0.0.0.0" (INADDR_ANY), you are basically telling the TCP/IP layer to use all available adapters for listening and to choose the best adapter for sending. This is standard practice for most socket code. The only time you wouldn't specify 0 for the IP address is when you want to send/receive on a specific network adapter.
Similarly if you specify a port value of 0 during bind, the OS will assign a randomly available port number for that socket. So I would expect for UDP multicast, you bind to INADDR_ANY on a specific port number where multicast traffic is expected to be sent to.
The "join multicast group" operation (IP_ADD_MEMBERSHIP) is needed because it basically tells your network adapter to listen not only for ethernet frames where the destination MAC address is your own, it also tells the ethernet adapter (NIC) to listen for IP multicast traffic as well for the corresponding multicast ethernet address. Each multicast IP maps to a multicast ethernet address. When you use a socket to send to a specific multicast IP, the destination MAC address on the ethernet frame is set to the corresponding multicast MAC address for the multicast IP. When you join a multicast group, you are configuring the NIC to listen for traffic sent to that same MAC address (in addition to its own).
Without the hardware support, multicast wouldn't be any more efficient than plain broadcast IP messages. The join operation also tells your router/gateway to forward multicast traffic from other networks. (Anyone remember MBONE?)
If you join a multicast group, all the multicast traffic for all ports on that IP address will be received by the NIC. Only the traffic destined for your binded listening port will get passed up the TCP/IP stack to your app. In regards to why ports are specified during a multicast subscription - it's because multicast IP is just that - IP only. "ports" are a property of the upper protocols (UDP and TCP).
You can read more about how multicast IP addresses map to multicast ethernet addresses at various sites. The Wikipedia article is about as good as it gets:
The IANA owns the OUI MAC address 01:00:5e, therefore multicast
packets are delivered by using the Ethernet MAC address range
01:00:5e:00:00:00 - 01:00:5e:7f:ff:ff. This is 23 bits of available
address space. The first octet (01) includes the broadcast/multicast
bit. The lower 23 bits of the 28-bit multicast IP address are mapped
into the 23 bits of available Ethernet address space.
Correction for What does it mean to bind a multicast (udp) socket? as long as it partially true at the following quote:
The "bind" operation is basically saying, "use this local UDP port for sending and receiving data. In other words, it allocates that UDP port for exclusive use for your application
There is one exception. Multiple applications can share the same port for listening (usually it has practical value for multicast datagrams), if the SO_REUSEADDR option applied. For example
int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); // create UDP socket somehow
...
int set_option_on = 1;
// it is important to do "reuse address" before bind, not after
int res = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, (char*) &set_option_on,
sizeof(set_option_on));
res = bind(sock, src_addr, len);
If several processes did such "reuse binding", then every UDP datagram received on that shared port will be delivered to each of the processes (providing natural joint with multicasts traffic).
Here are further details regarding what happens in a few cases:
attempt of any bind ("exclusive" or "reuse") to free port will be successful
attempt to "exclusive binding" will fail if the port is already "reuse-binded"
attempt to "reuse binding" will fail if some process keeps "exclusive binding"
It is also very important to distinguish a SENDING multicast socket from a RECEIVING multicast socket.
I agree with all the answers above regarding RECEIVING multicast sockets.
The OP noted that binding a RECEIVING socket to an interface did not help.
However, it is necessary to bind a multicast SENDING socket to an interface.
For a SENDING multicast socket on a multi-homed server, it is very important to create a separate socket for each interface you want to send to. A bound SENDING socket should be created for each interface.
// This is a fix for that bug that causes Servers to pop offline/online.
// Servers will intermittently pop offline/online for 10 seconds or so.
// The bug only happens if the machine had a DHCP gateway, and the gateway is no longer accessible.
// After several minutes, the route to the DHCP gateway may timeout, at which
// point the pingponging stops.
// You need 3 machines, Client machine, server A, and server B
// Client has both ethernets connected, and both ethernets receiving CITP pings (machine A pinging to en0, machine B pinging to en1)
// Now turn off the ping from machine B (en1), but leave the network connected.
// You will notice that the machine transmitting on the interface with
// the DHCP gateway will fail sendto() with errno 'No route to host'
if ( theErr == 0 )
{
// inspired by 'ping -b' option in man page:
// -b boundif
// Bind the socket to interface boundif for sending.
struct sockaddr_in bindInterfaceAddr;
bzero(&bindInterfaceAddr, sizeof(bindInterfaceAddr));
bindInterfaceAddr.sin_len = sizeof(bindInterfaceAddr);
bindInterfaceAddr.sin_family = AF_INET;
bindInterfaceAddr.sin_addr.s_addr = htonl(interfaceipaddr);
bindInterfaceAddr.sin_port = 0; // Allow the kernel to choose a random port number by passing in 0 for the port.
theErr = bind(mSendSocketID, (struct sockaddr *)&bindInterfaceAddr, sizeof(bindInterfaceAddr));
struct sockaddr_in serverAddress;
int namelen = sizeof(serverAddress);
if (getsockname(mSendSocketID, (struct sockaddr *)&serverAddress, (socklen_t *)&namelen) < 0) {
DLogErr(#"ERROR Publishing service... getsockname err");
}
else
{
DLog( #"socket %d bind, %# port %d", mSendSocketID, [NSString stringFromIPAddress:htonl(serverAddress.sin_addr.s_addr)], htons(serverAddress.sin_port) );
}
Without this fix, multicast sending will intermittently get sendto() errno 'No route to host'.
If anyone can shed light on why unplugging a DHCP gateway causes Mac OS X multicast SENDING sockets to get confused, I would love to hear it.