Can be two identical socket established? - sockets

A socket is combination of an IP address, a transport protocol and a port number.
For example: two hosts, on different LANs behind NAT, can have the same IP (let's say 192.168.0.2).
When these hosts connect to a web server, could two identical sockets be established?
I know that ports are randomly generated, but on different hosts it could be generated the same one.
Or it is server which tells that this port number is already in use?
Or is it NAT device, which changing SRC IP in IP packet?
How does NAT device knows then, where to send packet back to host on LAN?
Thanks

As you correctly guessed it is NAT who assigns ephemeral port number, so they can't coincide. Therefore, from server perspective, destination duplet of host #1 would be something like 192.168.0.2:46812, and for host #2 - 192.168.0.2:51378. When NAT receives these packets, it knows which host behind the NAT the packet belongs to based on the port number. If you have access to machine's console you can check the numbers with netstat -anp

Related

Sending a UDP datagram to a shared IP address and port

Suppose that two computers use the same Wi-Fi to access the Internet. Each of these computers has the same program installed, which is bound to the same UDP port. I want to know, since both computers have the same external IP address and listen to the same port but on different machines, what will be the result if a UDP datagram is sent to this common external address and to a common port, then which machine will receive it and how to send it each machine its own personal datagram?
The router will not forward the packet to either computer, since it doesn't know which one it should forward to.
In fact, even if the program was only running on one computer, the router still wouldn't forward the packet. It has to see outbound traffic going from the computer to the outside world first, before it decides which external port to use for forwarding inbound traffic back to the computer. And the router might not decide to use the same port on the public IP that the computer used on the private IP.
This is why everyone hates NAT and likes IPv6.

Why we use the local IP address in identifying sockets?

When a server want to create a socket, it will use a combination of its IP address and some well-known port, let us say 80. So, when a packet arrived, both the server IP and port 80 will be used to decide whether the packet goes to that socket or not.
The question is why do we need to check the IP address of the server, since the packet (aka datagram) passed the network layer check and was certainly destined for this server. In other words, the network layer will not pass the packet to transport layer if the destination IP is not the server IP, so why do we use the IP address in the socket?
And if a host (a client or a server) created multiple sockets (network processes) using both its IP and some port numbers, is there any case where the IP could be different in these sockets?
Thanks in advance!
Why do we need to check the IP address of the server, since the packet (aka datagram) passed the network layer?
The Data Link Layer uses Media Access Control (MAC) addresses to direct packets. When a packet arrives at your computer operating system (OS), it arrived either because the MAC address matched the hardware address or it was a broadcast (ff:ff:ff:ff:ff:ff).
Once the packet is received, your OS determines if it is destined for an IP address assigned to the computer. At this point, the OS has several options:
If the IP address matches an assigned IP, deliver to any waiting applications or reject the packet and handle any needed Internet Control Message Protocol (ICMP) required.
Should the IP not match an assigned, your OS checks if IP routing is enabled. Then either rejects the packet issuing any required reply or forwards the packet to the destination IP in the routing table by creating a new packet targeting the MAC address of the destination router.
If a host (a client or a server) created multiple sockets (network processes) using both its IP and some port numbers, is there any case where the IP could be different in these sockets?
If your OS assigns more than one IP address to an interface, all of those IP addresses would be available to be used. You can open sockets using any available IP (usually INADDR_ANY or similar). In a listening context, your port will be available to every IP address assigned. In a transmitting context, your IP will be set depending on the outbound interface.

How does a TCP server handle multiple Sockets to listening to the same port? [duplicate]

This might be a very basic question but it confuses me.
Can two different connected sockets share a port? I'm writing an application server that should be able to handle more than 100k concurrent connections, and we know that the number of ports available on a system is around 60k (16bit). A connected socket is assigned to a new (dedicated) port, so it means that the number of concurrent connections is limited by the number of ports, unless multiple sockets can share the same port. So the question.
TCP / HTTP Listening On Ports: How Can Many Users Share the Same Port
So, what happens when a server listen for incoming connections on a TCP port? For example, let's say you have a web-server on port 80. Let's assume that your computer has the public IP address of 24.14.181.229 and the person that tries to connect to you has IP address 10.1.2.3. This person can connect to you by opening a TCP socket to 24.14.181.229:80. Simple enough.
Intuitively (and wrongly), most people assume that it looks something like this:
Local Computer | Remote Computer
--------------------------------
<local_ip>:80 | <foreign_ip>:80
^^ not actually what happens, but this is the conceptual model a lot of people have in mind.
This is intuitive, because from the standpoint of the client, he has an IP address, and connects to a server at IP:PORT. Since the client connects to port 80, then his port must be 80 too? This is a sensible thing to think, but actually not what happens. If that were to be correct, we could only serve one user per foreign IP address. Once a remote computer connects, then he would hog the port 80 to port 80 connection, and no one else could connect.
Three things must be understood:
1.) On a server, a process is listening on a port. Once it gets a connection, it hands it off to another thread. The communication never hogs the listening port.
2.) Connections are uniquely identified by the OS by the following 5-tuple: (local-IP, local-port, remote-IP, remote-port, protocol). If any element in the tuple is different, then this is a completely independent connection.
3.) When a client connects to a server, it picks a random, unused high-order source port. This way, a single client can have up to ~64k connections to the server for the same destination port.
So, this is really what gets created when a client connects to a server:
Local Computer | Remote Computer | Role
-----------------------------------------------------------
0.0.0.0:80 | <none> | LISTENING
127.0.0.1:80 | 10.1.2.3:<random_port> | ESTABLISHED
Looking at What Actually Happens
First, let's use netstat to see what is happening on this computer. We will use port 500 instead of 80 (because a whole bunch of stuff is happening on port 80 as it is a common port, but functionally it does not make a difference).
netstat -atnp | grep -i ":500 "
As expected, the output is blank. Now let's start a web server:
sudo python3 -m http.server 500
Now, here is the output of running netstat again:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
So now there is one process that is actively listening (State: LISTEN) on port 500. The local address is 0.0.0.0, which is code for "listening for all ip addresses". An easy mistake to make is to only listen on port 127.0.0.1, which will only accept connections from the current computer. So this is not a connection, this just means that a process requested to bind() to port IP, and that process is responsible for handling all connections to that port. This hints to the limitation that there can only be one process per computer listening on a port (there are ways to get around that using multiplexing, but this is a much more complicated topic). If a web-server is listening on port 80, it cannot share that port with other web-servers.
So now, let's connect a user to our machine:
quicknet -m tcp -t localhost:500 -p Test payload.
This is a simple script (https://github.com/grokit/quickweb) that opens a TCP socket, sends the payload ("Test payload." in this case), waits a few seconds and disconnects. Doing netstat again while this is happening displays the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:54240 ESTABLISHED -
If you connect with another client and do netstat again, you will see the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:26813 ESTABLISHED -
... that is, the client used another random port for the connection. So there is never confusion between the IP addresses.
A server socket listens on a single port. All established client connections on that server are associated with that same listening port on the server side of the connection. An established connection is uniquely identified by the combination of client-side and server-side IP/Port pairs. Multiple connections on the same server can share the same server-side IP/Port pair as long as they are associated with different client-side IP/Port pairs, and the server would be able to handle as many clients as available system resources allow it to.
On the client-side, it is common practice for new outbound connections to use a random client-side port, in which case it is possible to run out of available ports if you make a lot of connections in a short amount of time.
A connected socket is assigned to a new (dedicated) port
That's a common intuition, but it's incorrect. A connected socket is not assigned to a new/dedicated port. The only actual constraint that the TCP stack must satisfy is that the tuple of (local_address, local_port, remote_address, remote_port) must be unique for each socket connection. Thus the server can have many TCP sockets using the same local port, as long as each of the sockets on the port is connected to a different remote location.
See the "Socket Pair" paragraph in the book "UNIX Network Programming: The sockets networking API" by
W. Richard Stevens, Bill Fenner, Andrew M. Rudoff at: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA52&dq=socket%20pair%20tuple&pg=PA52#v=onepage&q=socket%20pair%20tuple&f=false
Theoretically, yes. Practice, not. Most kernels (incl. linux) doesn't allow you a second bind() to an already allocated port. It weren't a really big patch to make this allowed.
Conceptionally, we should differentiate between socket and port. Sockets are bidirectional communication endpoints, i.e. "things" where we can send and receive bytes. It is a conceptional thing, there is no such field in a packet header named "socket".
Port is an identifier which is capable to identify a socket. In case of the TCP, a port is a 16 bit integer, but there are other protocols as well (for example, on unix sockets, a "port" is essentially a string).
The main problem is the following: if an incoming packet arrives, the kernel can identify its socket by its destination port number. It is a most common way, but it is not the only possibility:
Sockets can be identified by the destination IP of the incoming packets. This is the case, for example, if we have a server using two IPs simultanously. Then we can run, for example, different webservers on the same ports, but on the different IPs.
Sockets can be identified by their source port and ip as well. This is the case in many load balancing configurations.
Because you are working on an application server, it will be able to do that.
I guess none of the answers tells every detail of the process, so here it goes:
Consider an HTTP server:
It asks the OS to bind the port 80 to one or many IP addresses (if you choose 127.0.0.1, only local connections are accepted. You can choose 0.0.0.0 to bind to all IP addresses (localhost, local network, wide area network, both IP versions)).
When a client connects to that port, it WILL lock it up for a while (that's why the socket has a backlog: it queues a number of connection attempts, because they ARE NOT instantaneous).
The OS then chooses a random port and transfer that connection to that port (think of it as a temporary port that will handle all the traffic from now on).
The port 80 is then released for the next connection (first, it will accept the first one in the backlog).
When client or server disconnects, the random port is held open for a while (CLOSE_WAIT in the remote side, TIME_WAIT in the local side). That allows flushing some lost packets along the path. The default time for that state is 2 * MSL seconds (and it WILL consume memory while is waiting).
After that waiting, that random port is free again to receive other connections.
So, TCP cannot even share a port amongst two IP's!
No. It is not possible to share the same port at a particular instant. But you can make your application such a way that it will make the port access at different instant.
Absolutely not, because even multiple connections may shave same ports but they'll have different IP addresses

What does it mean to bind() a socket to any address other than localhost?

I don't understand what it means to bind a socket to any address other than 127.0.0.1 (or ::1, etc.).
Am I not -- by definition -- binding the socket to a port on my own machine.. which is localhost?
What sense does it make to bind or listen to another machine or IP address's port?
Conceptually, it just doesn't make sense to me!
(This has proven surprisingly hard to Google... possibly because I'm not Googling the right terms.)
Binding of a socket is done to address and port in order to receive data on this socket (most cases) or to use this address/port as the source of the data when sending data (for example used with data connections in FTP server).
Usually there are several interfaces on a specific machine, i.e. the pseudo-interface loopback where the machine can reach itself, ethernet, WLAN, VPN... . Each of these interfaces can have multiple IP addresses assigned. For example, loopback usually has 127.0.0.1 and with IPv6 also ::1, but you can assign others too. Ethernet or WLAN have the IP addresses on the local network, i.e. 172.16.0.34 or whatever.
If you bind a socket for receiving data to a specific address you can only receive data sent to this specific IP address. For example, if you bind to 127.0.0.1 you will be able to receive data from your own system but not from some other system on the local network, because they cannot send data to your 127.0.0.1: for one any data to 127.0.0.1 will be sent to their own 127.0.0.1 and second your 127.0.0.1 is an address on your internal loopback interface which is not reachable from outside.
You can also bind a socket to a catch-all address like 0.0.0.0 (Ipv4) and :: (Ipv6). In this case it is not bound to a specific IP address but will be able to receive data send to any IP address of the machine.

Demultiplexing in TCP/UDP

I know there is an older answer to this question here, though it does not seem to answer my question. If in UDP two people with different IP and different ports send data to the same server (same IP) at the same socket (since in UDP there is only one socket per application - correct me if i am wrong), how does server recognises which person is who?
Does it change anything if the two people use (by luck or not) the same port as source port but with different source IP?
The server can receive UDP datagrams from two different IP/port pairs (IP could be same, port could be same, or both could be different) on the same port. The recvfrom() function returns the source IP/port of the datagram in addition to the data.
As mentioned in the question you referenced, a UDP socket is defined only by the local IP and local port. The remote IP and port can differ for both outgoing and incoming packets.