What things are exactly happening when server socket accept client sockets? - sockets

I'm studying socket programming, and the server socket accept() is confusing me. I wrote two scenarios for server socket accept(), please take a look:
When the server socket does accept(), it creates a new (client) socket that is bound to a port that is different from the port the server socket is bound. So socket communication is done via newly bound port, and the server socket (for accept() only) is waiting for another client connection on the originally bound port.
I think this is not quite correct, because (1) a port matches to a single process and (2) socket accept is inside-process matter and single process can have multiple sockets. So thought of a second scenario, based on some of stackoverflow answers:
When a server socket does accept(), it creates a new (client) socket that is not bound to any specific port. When a client communicates with the server, it uses the port that is bound to the server socket (who accept()s connections) and which client socket to actually communicate is resolved by (sourceIP, sourcePort, destIP, destPort) tuple from TCP header(?) at Transmission level (this is also suspicious because I thought socket is somewhat of an application-level object)
This scenario also raises some questions. If the socket communications still use server socket's port, i.e. client sends some messages to the server socket port, doesn't it use the server socket's backlog queue? I mean, how can messages from a client be distinguished between connect() and read() or write()? And how can they be resolved to each client socket in the server, without any port binding?
If one of my scenarios is correct, would that answer to the questions following? Or perhaps, both of my scenarios are wrong. I'd be very thankful if you could guide me to correct answers, or at least, towards some relevant texts to study.

When you create a socket and do a bind on that socket and then a listen, what you have is what is called a listening socket.
When a connection is establised this socket is basically cloned to a new socket, and this socket is called the servicing socket the port to which it bound is still the same as the original port.
But there is an important distinction between this socket and the listening socket from before. Namely it is part of a socket pair.
It is the socket pair that uniquely identifies the connection. so as there are 2 sockets in the picture for a socket pair, there are 2 IP adresses and 2 ports for both ends of the TCP communication channel. During the cloning of the servicing socket, the TCP kernel will allocate what is called a TCB and in it it will store those 2 IP# and 2 ports. The TCB also contains the socket number that belongs to the TCB.
Each time a TCP segment comes in , the TCP header is checked and whether or not it is a SYN, for a SYN you would have connection establishment so that you passed already, but then the kernel is going through its list of listening sockets. If it is a normal TCP packet, not a SYN, both port numbers are in the TCP header and the IP# are part of the IP header, so using this information the kernel is able to find the TCP that belongs to this TCP connection. (For a SYN, this information is also there, but as I said, for a SYN you have to process only the listening sockets)
That is in a nutshell how it works.
This information can be found in UNIX Network Programming: the sockets networking API. In there the link to the sockets is described whereas in other reference material it is usually not described that much in detail, rather the nitty grits of TCP are usually highlighted.

When server socket do accept(), it creates a new (client) socket that is bind to port that is different from the port server socket is bind. So socket communication is done via newly bind port, and server socket (for accept() only) is waiting for another client connection on originally bind port.
No.
I think this is not quite proper answer
It is a wrong answer.
because (1) port matches to a single process
That doesn't mean anything relevant.
and (2) socket accept is inside-process matters
Nor does that. It doesn't appear to mean anything at all actually.
and single process can have multiple sockets.
That's true but it doesn't have any bearing on why your answer is wrong. The reason your answer is wrong is because no second port is used.
When server socket do accept(), it creates a new (client) socket that is not bind to any specific port
No. It creates a second socket that inherits everything from the server socket: port number, buffer sizes, socket options, ... everything except the file descriptor and the LISTENING state, and maybe I forgot something else. It then sets the remote IP:port of the socket to that of the client and puts the socket into ESTABLISHED state.
and when client communicates with the server
The client has already communicated with the server. That's why we are creating this socket.
it uses the port that is bind to server socket (who accept()s connections) and which client socket to actually communicate is resolved by (sourceIP, sourcePort, destIP, destPort) tuple from TCP header(?) at Transmission level
This has already happened.
This is also suspicious because I thought socket is somewhat application-level object)
No it isn't. A socket is a kernel-level object with an application-level file descriptor to identity it.
If the socket communications still use server socket's port, i.e. client sends some messages to server socket port, doesn't it uses server socket's backlog queue?
No. The backlog queue is for incoming connect requests, not for data. Incoming data goes into the socket receive buffer.
I mean, how can messages from client be distinguished between connect() and read() or write()?
Because a connect() request sets special bits in the TCP header. The final part of it can be combined with data.
And how can they be resolved to each client sockets in server, WITHOUT any port binding?
Port binding happens the moment the socket is created in the call to accept(). You invented this difficulty yourself. It isn't real.
If one of my scenario is correct, would answer to the questions following?
Neither of them is correct.
Or possibly I'm making two wrong scenarios, so it would be very thankful for you to provide right answers, or at least some relevant texts to study.
Surely you already have relevant texts to study? If you don't, you should read RFC 793 or W.R. Stevens, TCP/IP Illustrated, volume I, relevant chapters. You have several major misunderstandings here.

From the Linux programmer's manual, as found via man 2 accept. Link
The accept() system call is used with connection-based socket
types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection
request on the queue of pending connections for the listening socket,
sockfd, creates a new connected socket, and returns a new file
descriptor referring to that socket. The newly created socket is not
in the listening state. The original socket sockfd is unaffected by
this call.
So what happens is that you have a listening TCP socket. Someone requests to connect().
You then call accept(). The old listening socket remains in listening mode, while a new socket is created in connected mode. Port is the original listening port.
That does not interfere with the listening socket, because the new socket does not listen for incoming connections.

Related

Need more insights in listen system call linux

int listen(int sockfd, int backlog);
Here is a description form linux man page
listen() marks the socket referred to by sockfd as a passive socket,
that is, as a socket that will be used to accept incoming connection
requests using accept(2).
The sockfd argument is a file descriptor that refers to a socket of
type SOCK_STREAM or SOCK_SEQPACKET.
The backlog argument defines the maximum length to which the queue of
pending connections for sockfd may grow. If a connection request
arrives when the queue is full, the client may receive an error with an
indication of ECONNREFUSED or, if the underlying protocol supports
retransmission, the request may be ignored so that a later reattempt at
connection succeeds.
The thing which bug me alot is why do we actually need to call listen.Is it to make the server start listening to the binded address and port.Isn't it the case that as soon as an address and port is binded to the socket descriptor the client can simply connect to that binded address.Thinks get more confusing when we don't make the listen call during the creation of an UDP server,does the UDP server automatically start listening as soon as an address is binded to it.
A host can receive packets from anywhere at any time. With TCP, before you call listen the operating system responds to any arriving packet with an RST packet. After listen it will reply to SYN packets with a SYN-ACK packet, and adds information about the remote side to the list of pending "connections", and the connection is now established.
With UDP there are no connections. If an application is waiting for a UDP packet and one arrives, the operating system passes the packet to the app. If no one is expecting packets the OS drops them.
The bind function just gives an address for a socket. It's not limited for use in servers, you could use bind on client sockets as well to set the IP and port you connect from. It's usually not required, so you don't see it very often.

In TCP, if the server uses another port to communicate, how will it inform the client?

I'm studying socket programming in C. In TCP communication, a classical situation is that once the server accept() a connect() request from a client, it will fork a new process to handle this communication. Then the child process will use another port to communicate with the client. My question is, how does the server inform the client that it will use another port rather than the original one to do the subsequent communication? Which field in the TCP header and which phase of the handshake can reflect the port change?
For example, process PA on server A is listening to its port 80. Now process PB on client B wants to connect to A's port 80. Once PA accepts PB's connecting request, it will fork a new process PA1 to handle the communication with PB. Am I right till now? Next, will PA1 still use port 80 or another port such as 1234 to communication with PB? If it still uses 80, how can the server A distribute PB's communication to PA1? If it uses another port like 1234, how will the server A inform PB to use 1234 for the subsequent communication?
A TCP connection is uniquely identified by the tuple (source ip, source port, destination ip, destinatin port). These tuple is used by OS to "bind" the TCP connection to a process, meaning to know which process the OS should deliver the TCP package to.
When server socket accepts the TCP connection and fork, that process inherits the original process so it effectively take up the binding of the TCP connection to this newly forked process. The client in the remote machine does not know and does not need to know such thing happens. The whole network keeps seeing the same thing, the package of the same tuple flow through the network.
At this time, the original process will keep listening to new TCP connection. When new TCP connection request arrive, even it is from the same previous machine, the port must be different. In OS's perspective it is a different tuple, therefore it can distinguish the TCP pcakge and deliver to the right process.
You may ask why the client from the remote machine knows it has to use another port to initiate a new connection. This is simply because the client OS knows (or informed by the socket library) that this process is creating a separate new connection. OS will assign another unique port number to the process. That's how it is possible for multiple processes communicating to the same server port without message mess up.
To put it short, the operation of accept and fork in server is just a kind of transferring the ownership of a TCP connection binding to another process. Nothing change in the server port used in this communication.
In TCP communication, a classical situation is that once the server accept() a connect() request from a client, it will fork a new process to handle this communication.
Correct, or start a thread.
Then the child process will use another port to communicate with the client.
No. It will use the same port, via the accepted socket, inherited in the case of a child process.
My question is, how does the server inform the client that it will use another port rather than the original one to do the subsequent communication?
It doesn't, because this isn't the 'classical situation'.
Which field in the TCP header and which phase of the handshake can reflect the port change?
None. It doesn't happen that way. It would be a waste of a port.
For example, process PA on server A is listening to its port 80. Now process PB on client B wants to connect to A's port 80. Once PA accepts PB's connecting request, it will fork a new process PA1 to handle the communication with PB. Am I right till now?
Yes.
Next, will PA1 still use port 80 or another port such as 1234 to communication with PB?
Port 80.
If it still uses 80, how can the server A distribute PB's communication to PA1?
By inheritance of the accepted socket.
If it uses another port like 1234, how will the server A inform PB to use 1234 for the subsequent communication?
Doesn't happen.
The client chooses this port, not the server. The client will choose a port that's not already in use on that particular machine, and use that port to tell its connections apart (just as the server does).
For example say the client has IP address 1.2.3.4 and the server has IP address 4.3.2.1 and listens on port 80. If the client has two connections to that server and port, how will it tell them apart? Simple -- it assigns a different source port to each one. Say one gets port 50001 and one gets port 50002, then the two connections are:
1.2.3.4:50001 -> 4.3.2.1:80
and
1.2.3.4:50002 -> 4.3.2.1:80
The server knows these ports because it gets them from the TCP SYN packets sent from the client to the server. So the client tells the server, not the other way around.

Sockets TCP server

I have a question about network connection
for instance, A TCP Server support N connections simultaneously, each connection belongs other client host.The question is how many sockets the server needs?
Thanks
I think this is a valid question and do not understand why it has been downvoted.
Before I continue, an important distinction must be made. A socket is a file descriptor, while the port is an "identifier" for a socket. File descriptors/socket are owned by applications, so a port can be viewed as a way to route connections/packets to the correct application.
The way for example a web server works (or any other TCP-based server), is that you have a listen socket that is bound to a port (for example 80). When a client connects to the server, a new socket is automatically created by the operating system (this socket is the one that is returned by for example accept()). This socket is bound to the same local IP and port as the listen socket, but has a different remote IP/port. The operating system stores this mapping and routes packets belonging to this mapping to the new socket.
So the answer to your question is that only one listen socket is needed, but new sockets will be created as clients connect (and removed as they disconnect). The limit of sockets (file descriptors) than an application can create is controlled by the OS.

accept() function implementation in Unix

I have looked up in BSD code but got lost somewhere :(
the reason I want to check is this:
TCP RFC (http://www.ietf.org/rfc/rfc793.txt) sec 2.7 states:
"To provide for unique addresses within each TCP, we concatenate an internet address identifying the TCP with a port identifier to create a socket which will be unique throughout all networks connected together. A connection is fully specified by the pair of sockets at the ends."
Does this mean: socket = local (ip + port) ?
If yes, then the accept function of Unix returns a new socket descriptor. Will it mean that a new socket is created (in turn a new port is created) for responding to client requests?
PS: I am a novice in network programming.
[UPDATE] I understood what I read # How does the socket API accept() function work?.
My only doubt is: if socket = (local port +local ip), then a new socket would mean a new port for the same IP. going by this logic, accept returns a new socket (thus a new port is created). so all sending should occur through this new port.
Is what I understand here correct?
You are mostly correct. When you accept(), a new socket is created and the listening socket stays open to allow more incoming connections but the new socket uses the same local port number as the listening socket.
A connection is defined by a 5-tuple: protocol, local-addr, local-port, remote-addr, remote-port.
Therefore, each accepted connection is unique even though they all share the same local port number because the remote ip/port is always different. The listening socket has no remote ip/port and so is also unique.

General sockets UDP programming question

I have an FPGA device with which my code needs to talk. The protocol is as follows:
I send a single non-zero byte (UDP) to turn on a feature. The FPGA board then begins spewing data on the port from which I sent.
Do you see my dilemma? I know which port I sent the message to, but I do not know from which port I sent (is this port not typically chosen automatically by the OS?).
My best guess for what I'm supposed to do is create a socket with the destination IP and port number and then reuse the socket for receiving. If I do so, will it already be set up to listen on the port from which I sent the original message?
Also, for your information, variations of this code will be written in Python and C#. I can look up specific API's as both follow the BSD socket model.
This is exactly what connect(2) and getsockname(2) are for. As a bonus for connecting the UDP socket you will not have to specify the destination address/port on each send, you will be able to discover unavailable destination port (the ICMP reply from the target will manifest as error on the next send instead of being dropped), and your OS will not have to implicitly connect and disconnect the UDP socket on each send saving some cycles.
You can bind a socket to a specific port, check man bind
you can bind the socket to get the desired port.
The only problem with doing that is that you won't be able to run more then one instance of your program at a time on a computer.
You're using UDP to send/receive data. Simply create a new UDP socket and bind to your desired interface / port. Then instruct your FPGA program to send UDP packets back to the port you bound to. UDP does not require you to listen/set up connections. (only required with TCP)