number of simultaneous tcp/ip connections on 32 bit linux - sockets

I have a specific question on implementing a load balancer or a TCP/IP server program that does TCP/IP.
Since port number is 16 bits, there are a max of only 65536 ports on a single Linux box at any given time.
And TCP/IP needs a port number to talk to the outside world.
1) when a client establishes a connection, an ephemeral port number is chosen.
2) when a server listening on a socket accepts a connection, a port number is assigned.
So in my understanding at any given time only maximum 65536 TCP/IP connections can exist on a given machine.
So how is it that some or most load balancers claim 200,000 or more concurrent connections?
Can someone please explain that?
Also regarding load balancers, once a load balancer has forwarded a request to one of the servers behind it, can the load balancer somehow pass some information to it, that will help the server to respond back to the originating client directly to avoid the latency of sending back the response via the load balancer?
Thanks everyone for your help.
Thambi

Since port number is 16 bits, there are a max of only 65536 ports on a single Linux box at any given time.
65535 actually, as you can't use port zero.
when a server listening on a socket accepts a connection, a port number is assigned.
No it isn't. The incoming connection uses the same port it connected to. No new port is assigned on accept().
So in my understanding at any given time only maximum 65536 TCP/IP connections can exist on a given machine.
No, see above. The actual limit is determined by kernel and process resources: open FDs, thread stack memory, kernel buffer space, ... Not by the 16-bit port number.

I TCP connection is uniquely identified by a (remote IP address, remote port, remote IP address, remote port) tuple.
For a typical server application, there is only one to three local IP addresses and one or two local ports. For example, a web server might listen on local addresses ::1, ::ffff:93.184.216.34, and 2606:2800:220:1:248:1893:25c8:1946 (possibly via wildcard addresses, but that's irrelevant), and local ports 80 and 443.
For the simple case of a single local address and port that's still 2128 + 16 (less a few for special purpose and broadcast addresses), which will be problematic if you wish to communicate with the entire Earth in units of less than 4 million atoms (which might be possible if you converted all matter on Earth into small viruses).

There has been a confusion among this question so I'll try to explain it with examples.
First a couple words about ports: everyone knows that they don't exist physically, they are just an extra identification information for a connection, and also a way to allow multiple servers listening on the same address (if there was no concept of port only one server could be listening on one address, or some other mechanism would have to be in place). Also port is unsigned short so it can have values between 0 and 65535 (64k).
Now, restriction about ports: they are on the server side when bind ing: a (server) socket (let's call it SS) can bind to an address and port: (unless SO_REUSEADDR is set before first binding,) only one socket can listen on a particular address and port at a time, so if someone is already listening on a port you can't listen too. There are some well known ports (e.g.: sshd - 22, httpd - 80, RDP - 3389, ...) that should be avoided when creating
SS, a general guidline is never to use a port number < 1k. For a complete list of "reserved" ports, visit www.iana.org.
As stated in the link I posted in the comment there's a 5 item tuple(2 pairs + 1 additional element) that identify a connection (LocalIP: LocalPort, RemoteIP: RemotePort, Protocol) (the last member is just for rigurousity, at this point we don't care about it). Now for a particular SS that listens on a IP:Port, one of the 2 pairs will be the same for all the clients (client sockets: CS) that connect to it depending where looking at the connection from:
server's endpoint: LocalIP: LocalPort
client's endpoint: RemoteIP: RemotePort
(just like looking in the mirror).
Now I'm going to exemplify on 2 machines (Centos(192.168.149.43) - server and Windows(192.168.137.10) - client). I created a dummy TCP server in Python (note that the code is not structured, no exception handling, only IPv4 capable, the purpose is not to have a Python class but to see some socket behavior):
import sys
import select
import socket
HOST = "192.168.149.43"
PORT = 4461
WAIT_TIME = 0.5
if __name__ == "__main__":
conns = list()
nconns = 0
srv = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
srv.bind((HOST, PORT))
srv.listen(0xFF)
print "Entering loop, press a key to exit..."
while sys.stdin not in select.select([sys.stdin], [], [], 0)[0]:
if select.select([srv], [], [], WAIT_TIME)[0]:
conn = srv.accept()
print "Accepted connection from: (%s, %d)" % conn[1]
conns.append(conn)
nconns += 1
print "Active connections:", nconns
for item in conns:
item[0].close()
srv.close()
print "Exiting."
Here's the netstat output on the server machine (before running the server app). I chose port 4461 for communication:
[cfati#xobved-itaf:~]> netstat -an | grep 4461
[cfati#xobved-itaf:~]>
So nothing related to this port. Now after starting the server (I had to trim some spaces so that the output fits here):
[cfati#xobved-itaf:~]> netstat -anp | grep 4461
tcp 0 0 192.168.149.43:4461 0.0.0.0:* LISTEN
As you can see there is a socket listening for connections on port 4461.
Now going on the client machine and starting the Python interpreter, running the following code in the console:
>>> import sys
>>> import socket
>>> HOST = "192.168.149.43"
>>> PORT = 4461
>>>
>>> def create(no=1):
... ret = []
... for i in xrange(no):
... s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
... s.connect((HOST, PORT))
... ret.append(s)
... return ret
...
>>> sockets=[]
>>> sockets.extend(create())
Just after typing the last line on the server machine we look at the server output:
Accepted connection from: (192.168.137.10, 64218)
Active connections: 1
And the corresponding netstat output:
[cfati#xobved-itaf:~]> netstat -an | grep 4461
tcp 0 0 192.168.149.43:4461 0.0.0.0:* LISTEN
tcp 0 0 192.168.149.43:4461 192.168.137.10:64218 ESTABLISHED
You see the ESTABLISHED connection (this is the accepted socket - AS): The connection was initiated from 192.168.137.10 on port 64218, to 192.168.149.43 on port 4461.
Here's the corresponding netstat output on the client machine (after creating the connection):
e:\Work\Dev>netstat -an | findstr 4461
TCP 192.168.137.10:64218 192.168.149.43:4461 ESTABLISHED
As you can see the Local and Remote (IP/Port) pairs (compared to the output on the server machine) are reversed (like I mentioned above about looking in the mirror). If I go again on the client machine in the interpreter and re-run the last line (create a new connection):
>>> sockets.extend(create())
the output of the server app will show another entry:
Accepted connection from: (192.168.137.10, 64268)
Active connections: 2
while the netstat output on the server machine:
[cfati#xobved-itaf:~]> netstat -an | grep 4461
tcp 0 0 192.168.149.43:4461 0.0.0.0:* LISTEN
tcp 0 0 192.168.149.43:4461 192.168.137.10:64268 ESTABLISHED
tcp 0 0 192.168.149.43:4461 192.168.137.10:64218 ESTABLISHED
I'm not posting what netstat will output on the client machine since it's obvious.
Now, let's look at the 2 pairs each corresponding to an active connection: 192.168.137.10:64268, 192.168.137.10:64218. The 2 ports are returned by the accept function (Ux or Win) called on SS.
The 2 ports (64268 and 64218) are used by connections, but that doesn't mean that they cannot be used anymore. Other socket servers can listen on them (I am talking here in the server machine context), or they can be present as used ports in connections from other addresses. Here's a hypothetical netstat output:
tcp 0 0 192.168.149.43:4461 192.168.137.45:64218 ESTABLISHED
So, port 64218 can be also present in a connection from 192.168.137.45 (note that I changed the last IP byte).
As a conclusion, you were somehow right: there can't be more than 65535 (excluding 0 as specified in the other solution) simultaneous connections from the same IP address. This is a huge number, I don't know if in the real world it can be met, but even if so, there are "tricks" to get around it (one example is have 2+ SSs listening on 2+ different ports, and configure the client that if connection to the server to one port fails to use another, so the max simultaneous connections number from the same address can be increased by a factor equal to the number of ports we have servers listening on).
Load balancers handle connections from multiple addresses, so their number can easily grow to hundreds of thousands.

Related

What will happen if a server machine listens on more 65536 ports?

In the case of Storm, it says "Each worker uses a single port for receiving messages, and this setting defines which ports are open for use.", which means I can set the slot number to a number greater than 65536 so that a server machine could open more than 65536 processes and each of them listens on a unique port?
It's not how many tcp connections at a time, I know that tcp connections can be more than 65536.
So what will happen a server machine opens more than 65536 processes and listens on more that 65536 ports?
When a process listens on a TCP port, it doesn't just listen on a port number. It listens on a combination of port number and IP address. Wildcard IP addresses can be used to indicate listening on all IP addresses.
For local connections, you can use IP addresses like 127.0.0.2 or 127.1.0.1 in combination with various port numbers to exceed 65,536 local ports. You can have more than 65,536 ports in this way and it works fine. However, IP addresses inside 127.0.0.0/8 cannot be reached from other machines, so it's local only.
To have a port that can accept inbound connections over a network, you must bind it to a combination of an IP address that is reachable over that network with a TCP port number. So a machine with only one network address can only have 65,535 distinct ports that incoming TCP connections can be made to. But you can exceed this by assigning additional network IP addresses to the machine, assuming the software you are using allows you to specify the IP address to bind to. (If not, you can easily hack it to allow that.)

How does a TCP server handle multiple Sockets to listening to the same port? [duplicate]

This might be a very basic question but it confuses me.
Can two different connected sockets share a port? I'm writing an application server that should be able to handle more than 100k concurrent connections, and we know that the number of ports available on a system is around 60k (16bit). A connected socket is assigned to a new (dedicated) port, so it means that the number of concurrent connections is limited by the number of ports, unless multiple sockets can share the same port. So the question.
TCP / HTTP Listening On Ports: How Can Many Users Share the Same Port
So, what happens when a server listen for incoming connections on a TCP port? For example, let's say you have a web-server on port 80. Let's assume that your computer has the public IP address of 24.14.181.229 and the person that tries to connect to you has IP address 10.1.2.3. This person can connect to you by opening a TCP socket to 24.14.181.229:80. Simple enough.
Intuitively (and wrongly), most people assume that it looks something like this:
Local Computer | Remote Computer
--------------------------------
<local_ip>:80 | <foreign_ip>:80
^^ not actually what happens, but this is the conceptual model a lot of people have in mind.
This is intuitive, because from the standpoint of the client, he has an IP address, and connects to a server at IP:PORT. Since the client connects to port 80, then his port must be 80 too? This is a sensible thing to think, but actually not what happens. If that were to be correct, we could only serve one user per foreign IP address. Once a remote computer connects, then he would hog the port 80 to port 80 connection, and no one else could connect.
Three things must be understood:
1.) On a server, a process is listening on a port. Once it gets a connection, it hands it off to another thread. The communication never hogs the listening port.
2.) Connections are uniquely identified by the OS by the following 5-tuple: (local-IP, local-port, remote-IP, remote-port, protocol). If any element in the tuple is different, then this is a completely independent connection.
3.) When a client connects to a server, it picks a random, unused high-order source port. This way, a single client can have up to ~64k connections to the server for the same destination port.
So, this is really what gets created when a client connects to a server:
Local Computer | Remote Computer | Role
-----------------------------------------------------------
0.0.0.0:80 | <none> | LISTENING
127.0.0.1:80 | 10.1.2.3:<random_port> | ESTABLISHED
Looking at What Actually Happens
First, let's use netstat to see what is happening on this computer. We will use port 500 instead of 80 (because a whole bunch of stuff is happening on port 80 as it is a common port, but functionally it does not make a difference).
netstat -atnp | grep -i ":500 "
As expected, the output is blank. Now let's start a web server:
sudo python3 -m http.server 500
Now, here is the output of running netstat again:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
So now there is one process that is actively listening (State: LISTEN) on port 500. The local address is 0.0.0.0, which is code for "listening for all ip addresses". An easy mistake to make is to only listen on port 127.0.0.1, which will only accept connections from the current computer. So this is not a connection, this just means that a process requested to bind() to port IP, and that process is responsible for handling all connections to that port. This hints to the limitation that there can only be one process per computer listening on a port (there are ways to get around that using multiplexing, but this is a much more complicated topic). If a web-server is listening on port 80, it cannot share that port with other web-servers.
So now, let's connect a user to our machine:
quicknet -m tcp -t localhost:500 -p Test payload.
This is a simple script (https://github.com/grokit/quickweb) that opens a TCP socket, sends the payload ("Test payload." in this case), waits a few seconds and disconnects. Doing netstat again while this is happening displays the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:54240 ESTABLISHED -
If you connect with another client and do netstat again, you will see the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:26813 ESTABLISHED -
... that is, the client used another random port for the connection. So there is never confusion between the IP addresses.
A server socket listens on a single port. All established client connections on that server are associated with that same listening port on the server side of the connection. An established connection is uniquely identified by the combination of client-side and server-side IP/Port pairs. Multiple connections on the same server can share the same server-side IP/Port pair as long as they are associated with different client-side IP/Port pairs, and the server would be able to handle as many clients as available system resources allow it to.
On the client-side, it is common practice for new outbound connections to use a random client-side port, in which case it is possible to run out of available ports if you make a lot of connections in a short amount of time.
A connected socket is assigned to a new (dedicated) port
That's a common intuition, but it's incorrect. A connected socket is not assigned to a new/dedicated port. The only actual constraint that the TCP stack must satisfy is that the tuple of (local_address, local_port, remote_address, remote_port) must be unique for each socket connection. Thus the server can have many TCP sockets using the same local port, as long as each of the sockets on the port is connected to a different remote location.
See the "Socket Pair" paragraph in the book "UNIX Network Programming: The sockets networking API" by
W. Richard Stevens, Bill Fenner, Andrew M. Rudoff at: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA52&dq=socket%20pair%20tuple&pg=PA52#v=onepage&q=socket%20pair%20tuple&f=false
Theoretically, yes. Practice, not. Most kernels (incl. linux) doesn't allow you a second bind() to an already allocated port. It weren't a really big patch to make this allowed.
Conceptionally, we should differentiate between socket and port. Sockets are bidirectional communication endpoints, i.e. "things" where we can send and receive bytes. It is a conceptional thing, there is no such field in a packet header named "socket".
Port is an identifier which is capable to identify a socket. In case of the TCP, a port is a 16 bit integer, but there are other protocols as well (for example, on unix sockets, a "port" is essentially a string).
The main problem is the following: if an incoming packet arrives, the kernel can identify its socket by its destination port number. It is a most common way, but it is not the only possibility:
Sockets can be identified by the destination IP of the incoming packets. This is the case, for example, if we have a server using two IPs simultanously. Then we can run, for example, different webservers on the same ports, but on the different IPs.
Sockets can be identified by their source port and ip as well. This is the case in many load balancing configurations.
Because you are working on an application server, it will be able to do that.
I guess none of the answers tells every detail of the process, so here it goes:
Consider an HTTP server:
It asks the OS to bind the port 80 to one or many IP addresses (if you choose 127.0.0.1, only local connections are accepted. You can choose 0.0.0.0 to bind to all IP addresses (localhost, local network, wide area network, both IP versions)).
When a client connects to that port, it WILL lock it up for a while (that's why the socket has a backlog: it queues a number of connection attempts, because they ARE NOT instantaneous).
The OS then chooses a random port and transfer that connection to that port (think of it as a temporary port that will handle all the traffic from now on).
The port 80 is then released for the next connection (first, it will accept the first one in the backlog).
When client or server disconnects, the random port is held open for a while (CLOSE_WAIT in the remote side, TIME_WAIT in the local side). That allows flushing some lost packets along the path. The default time for that state is 2 * MSL seconds (and it WILL consume memory while is waiting).
After that waiting, that random port is free again to receive other connections.
So, TCP cannot even share a port amongst two IP's!
No. It is not possible to share the same port at a particular instant. But you can make your application such a way that it will make the port access at different instant.
Absolutely not, because even multiple connections may shave same ports but they'll have different IP addresses

On successful TCP connection between server and client

RELATED POST
The post here In UNIX forum describes
The server will keep on listeninig on a port number.
The server will accept a clients connect() request using accept(). As soon as the server accepts the client request, the kernel allocates a random port number for the server for further send() and receive(), since the same port number on the server can't be used for sending as well as listening, and the previous port is still listening for new connections
QUESTION
I have a server application S which is constantly listening on port 18333 (this is actually bitcoind testnet). When another client node C connects with it on say 53446 (random port). According to the above post, S will be able to send/receive data of 'C' only from port 53446.
But when I run a bitcoind testnet. This perfectly communicates with other node with only one socket connection in port 18333 without need for another for sending/receiving. Below is snippet and I even verified this
bitcoin-cli -testnet -rpcport=16591 -datadir=/home/user/mytest/1/
{
"id": 1,
"addr": "178.32.61.149:18333"
}
Can anyone help me understand what is the right working in TCP socket connection?
A TCP connection is identified by a socket pair and this is uniquely identified by 4 parameters :
source ip
source port
dest ip
dest port
For every connection that is established to a server the socket is basically cloned and the same port is being used. So for every connection you have a socket using the same server port. So you have n+1 socket using the same port when there are n connections.
The TCP kernel is able to make distinction between all these sockets and connections because the socket is either in the listening state, or it belongs to the socket pair where all 4 parameters are considered.
Your second bullet is therefore wrong because the same port is being used as i explained above.
The server will accept a clients connect() request using accept(). As
soon as the server accepts the client request, the kernel allocates a
random port number for the server for further send() and receive().
On normal TCP traffic this is not the case. If a webserver is listening on port 80, all packets sent back to the client wil be over server port 80 (this can be verified with WireShark for example) - but there will be a different socket for each connection (srcIP:port - dstIP:port). That information is sent in the headers of the network packets - IP and protocol code (TCP, UDP or other) in the IP header, port numbers as part of the TCP or UDP header).
But changing ports can happen when communicating over ftp, where there can be a control port (ususally 21) and a negotiated data port.

Transmission Control Protocol socket

When I open TCP with the server (on 7 layer of OSI), the layer 5 create socket with port number and IP.
I want to know if this socket include my IP/the server IP, and my (random) port or the server port (e.g. 80 for HTTP)
And when I open TCP with server we open TCP together
So it's mean we have common socket?
When I open TCP with the server (on 7 layer of OSI)
Forget about OSI. It is obsolete, and TCP/IP doesn't follow it. It has its own layer model.
The layer 5 create socket with port number and IP
TCP creates it at the TCP layer.
I want to know if this socket include my IP/the server IP, and my (random) port or the server port (80 for HTTP for ex.)
All of the above.
And when I open TCP with server we open TCP together So it's mean we have common socket?
No. A socket is an endpoint of a connection. There are two ends, and two sockets.
TCP is a Layer 4 as it is called - or a Transport Layer, so ignore the OSI model for the time being.
Generally - 'a socket' is just an end point without any identity. The socket gets it's identity when you bind to an address or connect to an address.
When you bind to an address - you only get your local port and local IP address in it's end point, but not the remote IP and port address. As such such socket is not very useful unless you listen on it. This is typically done on the server. Also note that you can bind to 'All Addresses on the machine' and then you really don't have any one end-point per se.
When you connect to a server (a TCP server # port 80 say), your OS TCP/IP stack makes use of a local IP address and chooses a random port to connect to a sever socket (like say one listening above). This is when all the 4 addresses come into picture. This socket is a connected socket and all 4 values will be present.

SSH session - fixed port on the client side

Is it possible to set the fixed port on the client side of the connection?
I connect to the ssh-server using port 22 and the client socket is getting random port to identify the session. An example (output from netstat -atn)
tcp4 0 0 <server>.22 <client>.54117 ESTABLISHED
In this example, client gets port 54117. For the test purposes, I'd like a fixed port to be assigned for the client, let's say 40185.
So I'd love the following output:
tcp4 0 0 <server>.22 <client>.40185 ESTABLISHED
Is it even possible?
You can do it programmaticaly, but the ssh(1) command doesn't allow to do that. The main reason is that you let the kernel select the client port, so you can open more than one ssh(1) session to the same server from different source ports in the same client machine. If you fix the port number in the client and the server, you cannot distinguish the packets belonging to one connection from the ones belonging to the other (same protocol, tcp, same source address, same dest address, same source port and same destination port)
To do it programmaticaly in a client and fix the local port, just call bind(2) system call to fix it, before doing the connect(2) system call (as the server does just before the accept(2) system call)
Be careful in that you cannot have two connections with the same five parameters (source add, source port, tcp protocol, dest port, dest addr)