Demultiplexing in TCP/UDP - sockets

I know there is an older answer to this question here, though it does not seem to answer my question. If in UDP two people with different IP and different ports send data to the same server (same IP) at the same socket (since in UDP there is only one socket per application - correct me if i am wrong), how does server recognises which person is who?
Does it change anything if the two people use (by luck or not) the same port as source port but with different source IP?

The server can receive UDP datagrams from two different IP/port pairs (IP could be same, port could be same, or both could be different) on the same port. The recvfrom() function returns the source IP/port of the datagram in addition to the data.
As mentioned in the question you referenced, a UDP socket is defined only by the local IP and local port. The remote IP and port can differ for both outgoing and incoming packets.

Related

are socket ports the same as regular ports [duplicate]

This question already has answers here:
What is the theoretical maximum number of open TCP connections that a modern Linux box can have
(3 answers)
Does the port change when a server accepts a TCP connection?
(3 answers)
How does the socket API accept() function work?
(4 answers)
Closed 3 years ago.
I read something I found contradictory with my current understanding of ports. If you google "how many ports does a server have", the first thing to come up states the following:
The server generally only ever uses one port, no matter how many clients are connected. It is the tuple of (client IP, client port,
server IP, server port) that must be unique for each TCP connection -
so the limit of 65535 ports is only relevant for how many connections
a single client can make to a single server.
I thought each time a client establishes a connection to a server, then a socket is creating using a regular port for the connection between the two?
If no, does it mean that a server can have more clients connected to it, than the maximum amount of regular ports?
I thought each time a client establishes a connection to a server, then a socket is creating using a regular port for the connection between the two?
The term "port" in this context is being used to describe, essentially, an address. The port number, along with the IP address, uniquely identifies one endpoint of the network.
Not only does the server endpoint generally only use a single port number, it would be a lot more difficult to make connections to the server if it didn't, because what port number would the client endpoint use to request the connection? DNS allows a client to look up the IP address, if the IP address is not already know, but there's no such facility for port numbers. So the port number has to be known in advance.
So, no…it is not the case that each time a client makes a connection, a socket is created using a "regular port" for the connection between the two. There's no "regular port". There's just "port", all ports are the same, and they are simply a number that identifies the endpoint's address.
If no, does it mean that a server can have more clients connected to it, than the maximum amount of regular ports?
Yes, it can. On the server end, the port number is (generally) always the same. For example, an HTTP server will (generally) use port 80. The listening socket will have as its port number "80", as will the server-side socket for each connection.
The port number can be reused like this, because each socket has other identifying characteristics besides the IP address and port number. In particular, the server's listening socket is unique; there is only one socket on the server end that has that IP address, that port number, and which has no connections (i.e. is listening).
Once a connection is made, a new socket is created to represent that connection. And that socket can be uniquely identified, because unlike the listening socket, it does have a connection (i.e. a remote endpoint) associated with it, along with the IP address and port number. When the client endpoint sends data to the server, the network layer can tell which socket to which that should be delivered, because that data comes from a specific remote endpoint, which also has a unique IP address and port number.
The combination of the server's and client's unique IP addresses and port numbers uniquely identifies that connection, making it distinct from any other socket on the server that may have the same server-side endpoint's IP address and port number.
In the text you quoted, this part is describing exactly this distinct, unique identification of a socket:
It is the tuple of (client IP, client port, server IP, server port) that must be unique for each TCP connection
In this way, the server's IP address and port number can be used an indefinite number of times (not counting other constrained resources on the server, like memory and tables that hold the state of the network connections).
The limitation on port numbers only comes into play when trying to create additional listening sockets (for servers) or additional connections (for clients). Servers typically won't run out of port numbers unless they are implementing a protocol that requires the server to create a connection back to a client's listening socket (this is uncommon), and clients won't run out of port numbers unless they try to make a very large number of connections.
It is this latter limit that this part of the text you quoted is referring to:
the limit of 65535 ports is only relevant for how many connections a single client can make to a single server.

What is the correct definition of a socket?

I have read contradictory definitions of what a socket comprise of (mainly in this question).
The first definition is that a socket comprise of the following:
{Source IP Address, Source Port Number}
The second definition is that a socket comprise of the following:
{Source IP Address, Source Port Number, Destination IP Address,
Destination Port Number}
Is there an official document or something that states what the correct definition is?
Also, is the Transport protocol included in the socket?
If you look at the RFCs, e.g. RFC 193, TRANSMISSION CONTROL PROTOCOL, you will see the definition:
Multiplexing:
To allow for many processes within a single Host to use TCP
communication facilities simultaneously, the TCP provides a set of
addresses or ports within each host. Concatenated with the network
and host addresses from the internet communication layer, this forms a
socket. A pair of sockets uniquely identifies each connection. That
is, a socket may be simultaneously used in multiple connections.
The first definition applies to an unconnected TCP or UDP socket.
The second definition applies to a connected TCP or UDP socket.

Why is UDP socket identified by destination IP address and destination port?

According to "Computer networking: a top-down approach", Kurose et al., a UDP socket is fully identified by destination IP and destination port.
Why do we need destination IP here? I thought UDP only need the destination port for the demultiplexing.
The machine may have multiple IPs, and different sockets may be bound to the same port on different IPs. It needs to use the destination IP to know which of these sockets the incoming datagram should be sent to.
In fact, it's quite common to use a different socket for each IP. When sending the reply, we want to ensure that the source IP matches the request's destination IP, so that the client can tell that the response came from the same server it sent to. By using different sockets for each IP, and sending the reply out the same socket that the request came in on, this consistency is maintained. Some socket implementations have an extension to allow setting the source IP at the time the reply is being sent, so they can use a single socket for all IPs, but this is not part of the standard sockets API.
I think that you are confusing UDP with Mulitcast.
Multicast is a broadcast protocol that doesn't need a destination IP address. It only needs a port number because it is delivered to all IP's on the given port.
UDP, by contrast, is only delivered to one IP. This is why it needs that destination IP address.

socket programming - why web server still using listen port 80 to communicate with client even after they accepted the connection?

Usually a web server is listening to any incoming connection through port 80. So, my question is that shouldn't it be that in general concept of socket programming is that port 80 is for listen for incoming connection. But then after the server accepted the connection, it will use another port e.g port 12345 to communicate with the client. But, when I look into the wireshark, the server is always using port 80 during the communication. I am confused here.
So what if https://www.facebook.com:443, it has hundreds of thousands of connection to the it at a second. Is it possible for a single port to handle such a large amount of traffic?
A particular socket is uniquely identified by a 5-tuple (i.e. a list of 5 particular properties.) Those properties are:
Source IP Address
Destination IP Address
Source Port Number
Destination Port Number
Transport Protocol (usually TCP or UDP)
These parameters must be unique for sockets that are open at the same time. Where you're probably getting confused here is what happens on the client side vs. what happens on the server side in TCP. Regardless of the application protocol in question (HTTP, FTP, SMTP, whatever,) TCP behaves the same way.
When you open a socket on the client side, it will select a random high-number port for the new outgoing connection. This is required, otherwise you would be unable to open two separate sockets on the same computer to the same server. Since it's entirely reasonable to want to do that (and it's very common in the case of web servers, such as having stackoverflow.com open in two separate tabs) and the 5-tuple for each socket must be unique, a random high-number port is used as the source port. However, each of those sockets will connect to port 80 at stackoverflow.com's webserver.
On the server side of things, stackoverflow.com can already distinguish between those two different sockets from your client, again, because they already have different client-side port numbers. When it sees an incoming request packet from your browser, it knows which of the sockets it has open with you to respond to because of the different source port number. Similarly, when it wants to send a response packet to you, it can send it to the correct endpoint on your side by setting the destination port number to the client-side port number it got the request from.
The bottom line is that it's unnecessary for each client connection to have a separate port number on the server's side because the server can already uniquely identify each client connection by its client IP address and client-side port number. This is the way TCP (and UDP) sockets work regardless of application-layer protocol.
shouldn't it be that in general concept of socket programming is that port 80 is for listen for incoming connection. But then after the server accepted the connection, it will use another port e.g port 12345 to communicate with the client.
No.
But, when I look into the wireshark, the server is always using port 80 during the communication.
Yes.
I am confused here.
Only because your 'general concept' isn't correct. An accepted socket uses the same local port as the listening socket.
So what if https://www.facebook.com:443, it has hundreds of thousands of connection to the it at a second. Is it possible for a single port to handle such a large amount of traffic?
A port is only a number. It isn't a physical thing. It isn't handling anything. TCP is identifying connections based on the tuple {source IP, source port, target IP, target port}. There's no problem as long as the entire tuple is unique.
Ports are a virtual concept, not a hardware ressource, it's no harder to handle 10 000 connection on 1 port than 1 connection each on 10 000 port (it's probably much faster even)
Not all servers are web servers listening on port 80, nor do all servers maintain lasting connections. Web servers in particular are stateless.
Your suggestion to open a new port for further communication is exactly what happens when using the FTP protocol, but as you have seen this is not necessary.
Ports are not a physical concept, they exist in a standardised form to allow multiple servers to be reachable on the same host without specialised multiplexing software. Such software does still exist, but for entirely different reasons (see: sshttp). What you see as a response from the server on port 80, the server sees as a reply to you on a not-so-random port the OS assigned your connection.
When a server listening socket accepts a TCP request in the first time ,the function such as Socket java.net.ServerSocket.accept() will return a new communication socket whoes port number is the same as the port from java.net.ServerSocket.ServerSocket(int port).
Here are the screen shots.

How TCP/UDP demultiplexing works?

I have the following statement.
"In TCP, the receiver host uses all of source IP, source port, destination IP and destination port to direct datagram to appropriate socket. While in UDP, the receiver only checks destination port number to direct the datagram. "
Is the above statement true?
If yes, does it mean that in TCP the same port can be used for multiple socket in one process, while in UDP only one socket can use on a port in one process? What about sockets in different processes? Can multiple processes use the same port in TCP/UDP? (in programming language: C/C++/Java)
If not, why?
"In TCP, the receiver host uses all of source IP, source port, destination IP and destination port to direct datagram to appropriate socket. While in UDP, the receiver only checks destination port number to direct the datagram. "
Is the above statement true?
Yes.
If yes, does it mean that in TCP the same port can be used for multiple socket in one process,
Yes, under some circumstances.
while in UDP only one socket can use on a port in one process?
No, see below.
What about sockets in different processes? Can multiple processes use the same port in TCP/UDP? (in programming language: C/C++/Java)
Under some circumstances, yes. A UDP port has to be designated as reusable by all processes that want to share it. A TCP port can only be reused by sockets bound to different interfaces: there is no sharing.
What that means is, in TCP, a unique communication "channel" can be described as the four-tuple: (src-ip, src-port, dst-ip, dst-port).
In UDP, all packets destined to a certain port are delivered to the only UDP socket listening on that port, regardless of the source address and port of said packet. I like to think of it as a funnel.