BitTorrent peers connect with each other via TCP (mainly). When a peer A tries to connect to peer B, does peer B also needs to try to connect with A simultaneously so the TCP 3-way handshake happens and they form a connection? If not, why?
Also, I have been studying three bittorrent client implementations. While they start TCP connections with the obtained peers, I noticed none of them opens a TCP socket to listen on the port they are announcing to the tracker. Does it mean no one can initiate connection to them? Shouldn't they create such TCP socket?
When a peer A tries to connect to peer B, does peer B also needs to try to connect with A simultaneously so the TCP 3-way handshake happens and they form a connection? If not, why?
Connection setup is a general TCP feature, not specific to bittorrent. One side initiates the connection by calling connect on an unconnected socket and the other side has a listening socket configured on which it calls accept in a loop to create create connection-specific sockets for each accepted incoming connection.
There is a simultaneous open flow for connection setup but that's rarely relevant and the connect/accept flow is used by bittorrent clients.
I noticed none of them opens a TCP socket to listen on the port they are announcing to the tracker.
They generally do and should unless process privileges are insufficient to bind a particular port or another process is already listening on it, in which case they should log a warning at least.
If you used a portscan then you may be seeing firewall or NATs getting in the way rather than the client not having a listening socket open. Instead you could use something like netstat (may need some additional arguments, depending on OS) to show listening sockets.
If they truly do not have a listening socket open then yes, that would be a problem since they could not accept incoming connections and only talk to a more limited set of clients (those that do).
Bittorrent being a peer-to-peer protocol means that clients should be equals (peers) which means they should be equally capable of initiating and accepting connections.
As socks5 rfc says,
A UDP association terminates when the TCP connection
that the UDP ASSOCIATE request arrived on terminates.
I wonder, doesn't "the TCP connection that the UDP ASSOCIATE request arrived on" just terminate when it timeouts? As there is no more data need to be sent in that TCP connection.
Should the client send meaningless data in that TCP connection just to keep it alive while it need the UDP association?
I wonder, doesn't "the TCP connection that the UDP ASSOCIATE request arrived on" just terminate when it timeouts?
No. The TCP connection is kept alive as a signal that the proxied UDP socket is still valid. Without it the client has no way of knowing whether the server is still reachable or whether the UDP socket is still allocated.
Similarly, without the TCP connection the server has no other way of knowing whether the client is still connected.
Should the client send meaningless data in that TCP connection just to keep it alive while it need the UDP association?
There's no need to send meaningless data. Just keep the connection alive. TCP has a built-in keepalive mechanism which can be turned on.
Edit: It's worth pointing out that the default TCP keepalive times on both Linux and Windows is fairly long. See this question on ways of tweaking it for specific sockets.
I'm studying socket programming, and the server socket accept() is confusing me. I wrote two scenarios for server socket accept(), please take a look:
When the server socket does accept(), it creates a new (client) socket that is bound to a port that is different from the port the server socket is bound. So socket communication is done via newly bound port, and the server socket (for accept() only) is waiting for another client connection on the originally bound port.
I think this is not quite correct, because (1) a port matches to a single process and (2) socket accept is inside-process matter and single process can have multiple sockets. So thought of a second scenario, based on some of stackoverflow answers:
When a server socket does accept(), it creates a new (client) socket that is not bound to any specific port. When a client communicates with the server, it uses the port that is bound to the server socket (who accept()s connections) and which client socket to actually communicate is resolved by (sourceIP, sourcePort, destIP, destPort) tuple from TCP header(?) at Transmission level (this is also suspicious because I thought socket is somewhat of an application-level object)
This scenario also raises some questions. If the socket communications still use server socket's port, i.e. client sends some messages to the server socket port, doesn't it use the server socket's backlog queue? I mean, how can messages from a client be distinguished between connect() and read() or write()? And how can they be resolved to each client socket in the server, without any port binding?
If one of my scenarios is correct, would that answer to the questions following? Or perhaps, both of my scenarios are wrong. I'd be very thankful if you could guide me to correct answers, or at least, towards some relevant texts to study.
When you create a socket and do a bind on that socket and then a listen, what you have is what is called a listening socket.
When a connection is establised this socket is basically cloned to a new socket, and this socket is called the servicing socket the port to which it bound is still the same as the original port.
But there is an important distinction between this socket and the listening socket from before. Namely it is part of a socket pair.
It is the socket pair that uniquely identifies the connection. so as there are 2 sockets in the picture for a socket pair, there are 2 IP adresses and 2 ports for both ends of the TCP communication channel. During the cloning of the servicing socket, the TCP kernel will allocate what is called a TCB and in it it will store those 2 IP# and 2 ports. The TCB also contains the socket number that belongs to the TCB.
Each time a TCP segment comes in , the TCP header is checked and whether or not it is a SYN, for a SYN you would have connection establishment so that you passed already, but then the kernel is going through its list of listening sockets. If it is a normal TCP packet, not a SYN, both port numbers are in the TCP header and the IP# are part of the IP header, so using this information the kernel is able to find the TCP that belongs to this TCP connection. (For a SYN, this information is also there, but as I said, for a SYN you have to process only the listening sockets)
That is in a nutshell how it works.
This information can be found in UNIX Network Programming: the sockets networking API. In there the link to the sockets is described whereas in other reference material it is usually not described that much in detail, rather the nitty grits of TCP are usually highlighted.
When server socket do accept(), it creates a new (client) socket that is bind to port that is different from the port server socket is bind. So socket communication is done via newly bind port, and server socket (for accept() only) is waiting for another client connection on originally bind port.
No.
I think this is not quite proper answer
It is a wrong answer.
because (1) port matches to a single process
That doesn't mean anything relevant.
and (2) socket accept is inside-process matters
Nor does that. It doesn't appear to mean anything at all actually.
and single process can have multiple sockets.
That's true but it doesn't have any bearing on why your answer is wrong. The reason your answer is wrong is because no second port is used.
When server socket do accept(), it creates a new (client) socket that is not bind to any specific port
No. It creates a second socket that inherits everything from the server socket: port number, buffer sizes, socket options, ... everything except the file descriptor and the LISTENING state, and maybe I forgot something else. It then sets the remote IP:port of the socket to that of the client and puts the socket into ESTABLISHED state.
and when client communicates with the server
The client has already communicated with the server. That's why we are creating this socket.
it uses the port that is bind to server socket (who accept()s connections) and which client socket to actually communicate is resolved by (sourceIP, sourcePort, destIP, destPort) tuple from TCP header(?) at Transmission level
This has already happened.
This is also suspicious because I thought socket is somewhat application-level object)
No it isn't. A socket is a kernel-level object with an application-level file descriptor to identity it.
If the socket communications still use server socket's port, i.e. client sends some messages to server socket port, doesn't it uses server socket's backlog queue?
No. The backlog queue is for incoming connect requests, not for data. Incoming data goes into the socket receive buffer.
I mean, how can messages from client be distinguished between connect() and read() or write()?
Because a connect() request sets special bits in the TCP header. The final part of it can be combined with data.
And how can they be resolved to each client sockets in server, WITHOUT any port binding?
Port binding happens the moment the socket is created in the call to accept(). You invented this difficulty yourself. It isn't real.
If one of my scenario is correct, would answer to the questions following?
Neither of them is correct.
Or possibly I'm making two wrong scenarios, so it would be very thankful for you to provide right answers, or at least some relevant texts to study.
Surely you already have relevant texts to study? If you don't, you should read RFC 793 or W.R. Stevens, TCP/IP Illustrated, volume I, relevant chapters. You have several major misunderstandings here.
From the Linux programmer's manual, as found via man 2 accept. Link
The accept() system call is used with connection-based socket
types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection
request on the queue of pending connections for the listening socket,
sockfd, creates a new connected socket, and returns a new file
descriptor referring to that socket. The newly created socket is not
in the listening state. The original socket sockfd is unaffected by
this call.
So what happens is that you have a listening TCP socket. Someone requests to connect().
You then call accept(). The old listening socket remains in listening mode, while a new socket is created in connected mode. Port is the original listening port.
That does not interfere with the listening socket, because the new socket does not listen for incoming connections.
I'm creating a socket table from a network composed of 2 hosts. They have a p2p connection and they're working with tcp protocol. Do I have to create a welcoming socket in both of the hosts (as they act like servers)? And a receiving socket as well as a sending socket for each host? Or just one for sending/receiving would be ok?
Each host will need a listening socket on which it will accept new connections. It will also need one socket for each inbound TCP connection (returned from the accept call when it accepts an inbound connection) and one for each outbound TCP connection (returned from the socket call before calling connect to make the outbound connection).
First of all, is there any problem with using both UDP and TCP on the same server?
Secondly, can I use the same port number?
Yes, you can use the same port number for both TCP and UDP. Many protocols already do this, for example DNS works on udp/53 and tcp/53.
Technically the port pools for each protocol are completely independent, but for higher level protocols that can use either TCP or UDP it's convention that they default to the same port number.
When writing your server, bear in mind that the sequence of events for a TCP socket is much harder than for a UDP socket, since as well as the normal socket and bind calls you also have to listen and accept.
Furthermore that accept call will return a new socket and it's that socket that you'll then have to also poll for receive events. Your server should be prepared to continue accepting connections on the original socket whilst simultaneously servicing multiple clients each of which will be triggering receive events on their own sockets.
Firstly,there is no problem using both tcp and udp on the server.
Secondly,we can have both UDP and TCP requests on same port ,because each request is identified by a quintuple contained by source IP ,Destination IP, Source Port, Destination Port, PROTOCOL(as protocol can be TCP or UDP).