are socket ports the same as regular ports [duplicate] - sockets

This question already has answers here:
What is the theoretical maximum number of open TCP connections that a modern Linux box can have
(3 answers)
Does the port change when a server accepts a TCP connection?
(3 answers)
How does the socket API accept() function work?
(4 answers)
Closed 3 years ago.
I read something I found contradictory with my current understanding of ports. If you google "how many ports does a server have", the first thing to come up states the following:
The server generally only ever uses one port, no matter how many clients are connected. It is the tuple of (client IP, client port,
server IP, server port) that must be unique for each TCP connection -
so the limit of 65535 ports is only relevant for how many connections
a single client can make to a single server.
I thought each time a client establishes a connection to a server, then a socket is creating using a regular port for the connection between the two?
If no, does it mean that a server can have more clients connected to it, than the maximum amount of regular ports?

I thought each time a client establishes a connection to a server, then a socket is creating using a regular port for the connection between the two?
The term "port" in this context is being used to describe, essentially, an address. The port number, along with the IP address, uniquely identifies one endpoint of the network.
Not only does the server endpoint generally only use a single port number, it would be a lot more difficult to make connections to the server if it didn't, because what port number would the client endpoint use to request the connection? DNS allows a client to look up the IP address, if the IP address is not already know, but there's no such facility for port numbers. So the port number has to be known in advance.
So, no…it is not the case that each time a client makes a connection, a socket is created using a "regular port" for the connection between the two. There's no "regular port". There's just "port", all ports are the same, and they are simply a number that identifies the endpoint's address.
If no, does it mean that a server can have more clients connected to it, than the maximum amount of regular ports?
Yes, it can. On the server end, the port number is (generally) always the same. For example, an HTTP server will (generally) use port 80. The listening socket will have as its port number "80", as will the server-side socket for each connection.
The port number can be reused like this, because each socket has other identifying characteristics besides the IP address and port number. In particular, the server's listening socket is unique; there is only one socket on the server end that has that IP address, that port number, and which has no connections (i.e. is listening).
Once a connection is made, a new socket is created to represent that connection. And that socket can be uniquely identified, because unlike the listening socket, it does have a connection (i.e. a remote endpoint) associated with it, along with the IP address and port number. When the client endpoint sends data to the server, the network layer can tell which socket to which that should be delivered, because that data comes from a specific remote endpoint, which also has a unique IP address and port number.
The combination of the server's and client's unique IP addresses and port numbers uniquely identifies that connection, making it distinct from any other socket on the server that may have the same server-side endpoint's IP address and port number.
In the text you quoted, this part is describing exactly this distinct, unique identification of a socket:
It is the tuple of (client IP, client port, server IP, server port) that must be unique for each TCP connection
In this way, the server's IP address and port number can be used an indefinite number of times (not counting other constrained resources on the server, like memory and tables that hold the state of the network connections).
The limitation on port numbers only comes into play when trying to create additional listening sockets (for servers) or additional connections (for clients). Servers typically won't run out of port numbers unless they are implementing a protocol that requires the server to create a connection back to a client's listening socket (this is uncommon), and clients won't run out of port numbers unless they try to make a very large number of connections.
It is this latter limit that this part of the text you quoted is referring to:
the limit of 65535 ports is only relevant for how many connections a single client can make to a single server.

Related

Understanding of WebSockets

My understanding is that a socket corresponds to a network identifier, port and TCP identifier. [1]
Operating systems enable a process to be associated with a port (which IIUC is a way of making the process addressable on the network for inbound data).
So a WebSocket server will typically be associated with a port well-known for accepting and understanding HTTP for the upgrade request (like 443) and then use TCP identifiers to enable multiple network sockets to be open concurrently for a single server process and a single port.
Please can someone confirm or correct my understanding?
[1] "To provide for unique names at
each TCP, we concatenate a NETWORK identifier, and a TCP identifier
with a port name to create a SOCKET name which will be unique
throughout all networks connected together." https://www.rfc-editor.org/rfc/rfc675
When a client connects to your server on a given port, the client connection is coming from an IP address and a client-side port number. The client-side port number is automatically generated by the client and will be unique for that client. So, you end up with four items that make a connection.
Server IP address (well known to all clients)
Server port (well known to all clients)
Client IP address (unique for that client)
Client port (dynamically unique for that client and that socket)
So, it is the combination of these four items that make a unique TCP connection. If the same client makes a second connection to the same server and port, then that second connection will have a different client port number (each connection a client makes will be given a different client port number) and thus the combination of those four items above will be different for that second client connection, allowing it's traffic to be completely separate from the first connection that client made.
So, a TCP socket is a unique combination of the four items above. To see how that is used, let's look at how some traffic flows.
After a client connects to the server and a TCP socket is created to represent that connection, then the client sends a packet. The packet is sent from the client IP address and from the unique client port number that that particular socket is using. When the server receives that packet on its own port number, it can see that the packet is coming from the client IP address and from that particular client port number. It can use these items to look up in its table and see which TCP socket this traffic is associated with and trigger an event for that particular socket. This separates that client's traffic from all the other currently connected sockets (whether they are other connections from that same client or connections from other clients).
Now, the server wants to send a response to that client. The packet is sent to the client's IP address and client port number. The client TCP stack does the same thing. It receives the packet from the server IP/port and addressed to the specific client port number and can then associate that packet with the appropriate TCP socket on the client so it can trigger an event on the right socket.
All traffic can uniquely be associated with the appropriate client or server TCP socket in this way, even though many clients may connect to the same server IP and port. The uniqueness of the client IP/port allows both ends to tell which socket a given packet belongs to.
webSocket connections start out with an HTTP connection (which is a TCP socket running the HTTP protocol). That initial HTTP request contains an "upgrade" header requesting the server to upgrade the protocol from HTTP to webSocket. If the server agrees to the upgrade, then it returns a response that indicates that the protocol will be changed to the webSocket protocol. The TCP socket remains the same, but both sides agree that they will now speak the webSocket protocol instead of the HTTP protocol. So, once connected, you then have a TCP socket where both sides are speaking the webSocket protocol. This TCP connection uses the same logic described above to remain unique from other TCP connections to the same server.
In this manner, you can have a single server on a single port that works for both HTTP connections and webSocket connections. All connections to that server start out as HTTP connections, but some are converted to webSocket connections after both sides agree to change the protocol. The HTTP connections that remain HTTP connections will be typical request/response and then the socket will be closed. The HTTP connections that are "upgraded" to the webSocket protocol will remain open for the duration of the webSocket session (which can be long lived). You can have many concurrent open webSocket connections that are all distinct from one another while new HTTP connections are regularly serviced all by the same server. The TCP logic above is used to keep track of which packets to/from the same server/port belong to which connection.
FYI, you may have heard about NAT (Network Address Translation). This is commonly used to allow private networks (like a home or corporate network) to interface to a public network (like the internet). With NAT a server may see multiple clients as having the same client IP address even though they are physically different computers on a private network). With NAT, multiple computers are routed through a common IP address, but NAT still guarantees that the client IP address and client port number are still a unique combination so the above scheme still works. When using NAT an incoming packet destined for a particular client arrives at the shared IP address. The IP/port is then translated to the actual client IP address and port number on the private network and then packet is forwarded to that device. The server is generally unaware of this translation and packet forwarding. Because the NAT server still maintains the uniqueness of the client IP/client port combination, the server's logic still works just fine even though it appears that many clients are sharing a common IP address). Note, home network routes are usually configured to use NAT since all computers on the home network will "share" the one public IP address that your router has when accessing the internet.
You will not enable multiple sockets, there is no need for it. You will have multiple conections. It's a little different, but you undesrstand well. For UDP there's nothing to do, cause there is no connections.
In TCP, if two different machines connect to the same port on a third machine, there are two distinct connections because the source IPs differ. If the same machine (or two behind NAT or otherwise sharing the same IP address) connects twice to a single remote end, the connections are differentiated by source port, the same machine cannot open 2 connections on the same port.

Relation between sockets, ports and processes in Computer Networking [duplicate]

This question already has answers here:
why cannot we use process id insted of taking the port we are binding
(3 answers)
Closed 5 years ago.
I am very new to TCP/IP networks and learning about sockets/ports. I have a few confusions. I am mentioning what I understand.
A node N1 has multiple processes running. Say a process P1 has some string that it wishes to send to some other Node N2. N1 will request OS to create a socket which is essentially like a network I/O streaming channel. Such a channel will be created and handed over to the process along with a socket descriptor. So, we can say that socket can be recognised in the world by the node i.e. IP of node + process which requested the socket. Hence, comes the concept of socket address which is basically IP of node + port address (used for identifying processs). So, my doubts are:
From where comes the idea of ports here. Socket can be identified as IP of node + Process ID. Why ports are required to identify a process. Why can't the process descriptor be self sufficient. Why port address. Examples?
Why do we need to bind the socket with a socket address if the node has to just pass the data and nothing needs to be received. Binding of socket address essentially means "to start recognising socket with IP address of node + port address apart from its descriptor" which is useful for other nodes if they wish to send some data to Node N1. But what I think is that for any process in a node that wish to communicate over network, there should be one "global" socket which will not be binded. All processes will use it for sending data only. If in case any node wish to recieve data, they can have a separate socket which will be binded so that other nodes in network can recognise that particular socket.
Where exactly does TCP/UDP fit in the picture? Can I have two ports which are like TCP port 3000 and UDP port 3000 i.e. separate ports with different transport protocol but same port numbers. Is this possible with sockets too?
So, we can say that socket can be recognised in the world by the node i.e. IP of node + process which requested the socket.
Not 'in the world'. Only within the localhost. The socket only exists within the localhost, and the process ID is only known within the localhost.
Hence, comes the concept of socket address which is basically IP of node + port address (used for identifying process)
No. The port identifies the service. The process implements the service.
From where comes the idea of ports here.
RFC 793.
Socket can be identified as IP of node + Process ID.
No they can't. A peer on another host has no way of getting a remote process ID. Some fixed operating-system-agnostic identifier is required. And a process can own many ports. The suggestion doesn't begin to make sense.
Why ports are required to identify a process.
Ports do not identify a process. The question doesn't make sense.
Why can't the process descriptor be self sufficient. Why port address.
Because the first question you asked is fallacious . This is just another version of it.
Why do we need to bind the socket with a socket address if the node has to just pass the data and nothing needs to be received.
Because connections are identified by address:port pairs.
Binding of socket address essentially means "to start recognising socket with IP address of node + port address apart from its descriptor" which is useful for other nodes if they wish to send some data to Node N1.
It is also rather useful for this node, to know where the incoming data should go.
But what I think is that for any process in a node that wish to communicate over network, there should be one "global" socket which will not be binded. All processes will use it for sending data only. If in case any node wish to recieve data, they can have a separate socket which will be binded so that other nodes in network can recognise that particular socket.
Regardless of the invalidity and pointlessness of this scheme, your thoughts are 40 years too late.
Can I have two ports which are like TCP port 3000 and UDP port 3000 i.e. separate ports with different transport protocol but same port numbers.
Yes.
Where exactly does TCP/UDP fit in the picture?
They implement ports.
Is this possible with sockets too?
I can't make any sense out of this question. All sockets are distinct from each other.

How does a TCP server handle multiple Sockets to listening to the same port? [duplicate]

This might be a very basic question but it confuses me.
Can two different connected sockets share a port? I'm writing an application server that should be able to handle more than 100k concurrent connections, and we know that the number of ports available on a system is around 60k (16bit). A connected socket is assigned to a new (dedicated) port, so it means that the number of concurrent connections is limited by the number of ports, unless multiple sockets can share the same port. So the question.
TCP / HTTP Listening On Ports: How Can Many Users Share the Same Port
So, what happens when a server listen for incoming connections on a TCP port? For example, let's say you have a web-server on port 80. Let's assume that your computer has the public IP address of 24.14.181.229 and the person that tries to connect to you has IP address 10.1.2.3. This person can connect to you by opening a TCP socket to 24.14.181.229:80. Simple enough.
Intuitively (and wrongly), most people assume that it looks something like this:
Local Computer | Remote Computer
--------------------------------
<local_ip>:80 | <foreign_ip>:80
^^ not actually what happens, but this is the conceptual model a lot of people have in mind.
This is intuitive, because from the standpoint of the client, he has an IP address, and connects to a server at IP:PORT. Since the client connects to port 80, then his port must be 80 too? This is a sensible thing to think, but actually not what happens. If that were to be correct, we could only serve one user per foreign IP address. Once a remote computer connects, then he would hog the port 80 to port 80 connection, and no one else could connect.
Three things must be understood:
1.) On a server, a process is listening on a port. Once it gets a connection, it hands it off to another thread. The communication never hogs the listening port.
2.) Connections are uniquely identified by the OS by the following 5-tuple: (local-IP, local-port, remote-IP, remote-port, protocol). If any element in the tuple is different, then this is a completely independent connection.
3.) When a client connects to a server, it picks a random, unused high-order source port. This way, a single client can have up to ~64k connections to the server for the same destination port.
So, this is really what gets created when a client connects to a server:
Local Computer | Remote Computer | Role
-----------------------------------------------------------
0.0.0.0:80 | <none> | LISTENING
127.0.0.1:80 | 10.1.2.3:<random_port> | ESTABLISHED
Looking at What Actually Happens
First, let's use netstat to see what is happening on this computer. We will use port 500 instead of 80 (because a whole bunch of stuff is happening on port 80 as it is a common port, but functionally it does not make a difference).
netstat -atnp | grep -i ":500 "
As expected, the output is blank. Now let's start a web server:
sudo python3 -m http.server 500
Now, here is the output of running netstat again:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
So now there is one process that is actively listening (State: LISTEN) on port 500. The local address is 0.0.0.0, which is code for "listening for all ip addresses". An easy mistake to make is to only listen on port 127.0.0.1, which will only accept connections from the current computer. So this is not a connection, this just means that a process requested to bind() to port IP, and that process is responsible for handling all connections to that port. This hints to the limitation that there can only be one process per computer listening on a port (there are ways to get around that using multiplexing, but this is a much more complicated topic). If a web-server is listening on port 80, it cannot share that port with other web-servers.
So now, let's connect a user to our machine:
quicknet -m tcp -t localhost:500 -p Test payload.
This is a simple script (https://github.com/grokit/quickweb) that opens a TCP socket, sends the payload ("Test payload." in this case), waits a few seconds and disconnects. Doing netstat again while this is happening displays the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:54240 ESTABLISHED -
If you connect with another client and do netstat again, you will see the following:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.10:500 192.168.1.13:26813 ESTABLISHED -
... that is, the client used another random port for the connection. So there is never confusion between the IP addresses.
A server socket listens on a single port. All established client connections on that server are associated with that same listening port on the server side of the connection. An established connection is uniquely identified by the combination of client-side and server-side IP/Port pairs. Multiple connections on the same server can share the same server-side IP/Port pair as long as they are associated with different client-side IP/Port pairs, and the server would be able to handle as many clients as available system resources allow it to.
On the client-side, it is common practice for new outbound connections to use a random client-side port, in which case it is possible to run out of available ports if you make a lot of connections in a short amount of time.
A connected socket is assigned to a new (dedicated) port
That's a common intuition, but it's incorrect. A connected socket is not assigned to a new/dedicated port. The only actual constraint that the TCP stack must satisfy is that the tuple of (local_address, local_port, remote_address, remote_port) must be unique for each socket connection. Thus the server can have many TCP sockets using the same local port, as long as each of the sockets on the port is connected to a different remote location.
See the "Socket Pair" paragraph in the book "UNIX Network Programming: The sockets networking API" by
W. Richard Stevens, Bill Fenner, Andrew M. Rudoff at: http://books.google.com/books?id=ptSC4LpwGA0C&lpg=PA52&dq=socket%20pair%20tuple&pg=PA52#v=onepage&q=socket%20pair%20tuple&f=false
Theoretically, yes. Practice, not. Most kernels (incl. linux) doesn't allow you a second bind() to an already allocated port. It weren't a really big patch to make this allowed.
Conceptionally, we should differentiate between socket and port. Sockets are bidirectional communication endpoints, i.e. "things" where we can send and receive bytes. It is a conceptional thing, there is no such field in a packet header named "socket".
Port is an identifier which is capable to identify a socket. In case of the TCP, a port is a 16 bit integer, but there are other protocols as well (for example, on unix sockets, a "port" is essentially a string).
The main problem is the following: if an incoming packet arrives, the kernel can identify its socket by its destination port number. It is a most common way, but it is not the only possibility:
Sockets can be identified by the destination IP of the incoming packets. This is the case, for example, if we have a server using two IPs simultanously. Then we can run, for example, different webservers on the same ports, but on the different IPs.
Sockets can be identified by their source port and ip as well. This is the case in many load balancing configurations.
Because you are working on an application server, it will be able to do that.
I guess none of the answers tells every detail of the process, so here it goes:
Consider an HTTP server:
It asks the OS to bind the port 80 to one or many IP addresses (if you choose 127.0.0.1, only local connections are accepted. You can choose 0.0.0.0 to bind to all IP addresses (localhost, local network, wide area network, both IP versions)).
When a client connects to that port, it WILL lock it up for a while (that's why the socket has a backlog: it queues a number of connection attempts, because they ARE NOT instantaneous).
The OS then chooses a random port and transfer that connection to that port (think of it as a temporary port that will handle all the traffic from now on).
The port 80 is then released for the next connection (first, it will accept the first one in the backlog).
When client or server disconnects, the random port is held open for a while (CLOSE_WAIT in the remote side, TIME_WAIT in the local side). That allows flushing some lost packets along the path. The default time for that state is 2 * MSL seconds (and it WILL consume memory while is waiting).
After that waiting, that random port is free again to receive other connections.
So, TCP cannot even share a port amongst two IP's!
No. It is not possible to share the same port at a particular instant. But you can make your application such a way that it will make the port access at different instant.
Absolutely not, because even multiple connections may shave same ports but they'll have different IP addresses

Demultiplexing in TCP/UDP

I know there is an older answer to this question here, though it does not seem to answer my question. If in UDP two people with different IP and different ports send data to the same server (same IP) at the same socket (since in UDP there is only one socket per application - correct me if i am wrong), how does server recognises which person is who?
Does it change anything if the two people use (by luck or not) the same port as source port but with different source IP?
The server can receive UDP datagrams from two different IP/port pairs (IP could be same, port could be same, or both could be different) on the same port. The recvfrom() function returns the source IP/port of the datagram in addition to the data.
As mentioned in the question you referenced, a UDP socket is defined only by the local IP and local port. The remote IP and port can differ for both outgoing and incoming packets.

socket programming - why web server still using listen port 80 to communicate with client even after they accepted the connection?

Usually a web server is listening to any incoming connection through port 80. So, my question is that shouldn't it be that in general concept of socket programming is that port 80 is for listen for incoming connection. But then after the server accepted the connection, it will use another port e.g port 12345 to communicate with the client. But, when I look into the wireshark, the server is always using port 80 during the communication. I am confused here.
So what if https://www.facebook.com:443, it has hundreds of thousands of connection to the it at a second. Is it possible for a single port to handle such a large amount of traffic?
A particular socket is uniquely identified by a 5-tuple (i.e. a list of 5 particular properties.) Those properties are:
Source IP Address
Destination IP Address
Source Port Number
Destination Port Number
Transport Protocol (usually TCP or UDP)
These parameters must be unique for sockets that are open at the same time. Where you're probably getting confused here is what happens on the client side vs. what happens on the server side in TCP. Regardless of the application protocol in question (HTTP, FTP, SMTP, whatever,) TCP behaves the same way.
When you open a socket on the client side, it will select a random high-number port for the new outgoing connection. This is required, otherwise you would be unable to open two separate sockets on the same computer to the same server. Since it's entirely reasonable to want to do that (and it's very common in the case of web servers, such as having stackoverflow.com open in two separate tabs) and the 5-tuple for each socket must be unique, a random high-number port is used as the source port. However, each of those sockets will connect to port 80 at stackoverflow.com's webserver.
On the server side of things, stackoverflow.com can already distinguish between those two different sockets from your client, again, because they already have different client-side port numbers. When it sees an incoming request packet from your browser, it knows which of the sockets it has open with you to respond to because of the different source port number. Similarly, when it wants to send a response packet to you, it can send it to the correct endpoint on your side by setting the destination port number to the client-side port number it got the request from.
The bottom line is that it's unnecessary for each client connection to have a separate port number on the server's side because the server can already uniquely identify each client connection by its client IP address and client-side port number. This is the way TCP (and UDP) sockets work regardless of application-layer protocol.
shouldn't it be that in general concept of socket programming is that port 80 is for listen for incoming connection. But then after the server accepted the connection, it will use another port e.g port 12345 to communicate with the client.
No.
But, when I look into the wireshark, the server is always using port 80 during the communication.
Yes.
I am confused here.
Only because your 'general concept' isn't correct. An accepted socket uses the same local port as the listening socket.
So what if https://www.facebook.com:443, it has hundreds of thousands of connection to the it at a second. Is it possible for a single port to handle such a large amount of traffic?
A port is only a number. It isn't a physical thing. It isn't handling anything. TCP is identifying connections based on the tuple {source IP, source port, target IP, target port}. There's no problem as long as the entire tuple is unique.
Ports are a virtual concept, not a hardware ressource, it's no harder to handle 10 000 connection on 1 port than 1 connection each on 10 000 port (it's probably much faster even)
Not all servers are web servers listening on port 80, nor do all servers maintain lasting connections. Web servers in particular are stateless.
Your suggestion to open a new port for further communication is exactly what happens when using the FTP protocol, but as you have seen this is not necessary.
Ports are not a physical concept, they exist in a standardised form to allow multiple servers to be reachable on the same host without specialised multiplexing software. Such software does still exist, but for entirely different reasons (see: sshttp). What you see as a response from the server on port 80, the server sees as a reply to you on a not-so-random port the OS assigned your connection.
When a server listening socket accepts a TCP request in the first time ,the function such as Socket java.net.ServerSocket.accept() will return a new communication socket whoes port number is the same as the port from java.net.ServerSocket.ServerSocket(int port).
Here are the screen shots.