Bittorent protocol. Multiple downloads between same peers - sockets

I'am just started to studying bittorent protocol and have one question about bittorrent clients that uses same ports for all incoming connections.
Here is example of my problem:
1st connection: A local peer listening on 1.0.0.1:1 and receives connection from remote peer with address 2.0.0.2:2. OK. Start PWP
2nd connection: A local peer listening on 1.0.0.1:1 and receives connection from remote peer with address 3.0.0.3:2. OK. Start PWP
3rd connection: A local peer listening on 1.0.0.1:1 and receives connection from remote peer with address 2.0.0.2:2 but with another InfoHash. ???
Is 3rd connection possible or remote peer(2.0.0.2:2) have to track it's connections to local peer(1.0.0.1:1) and will connect to local peer through different port and also announce himself 2nd time to tracker with different port or/and peer id?

Yes, it is possible for two peers to use more than one connection, sharing torrents with different InfoHashes at the same time. There are two different cases depending on what type of transport protocol the connection is done over.
TCP connection
When a bittorrent peer shares over TCP, it opens up a incoming port that is announced to the tracker(s) and that port is the one used to connect to by every other peer that initiates a connection to that peer. However, a peer that initiates and makes a outgoing connections uses a different port number for every outgoing connection.
That makes it possible to uniquely identify every connection by its (IP:PORT<->IP:PORT)-pair.
uTP/UDP connection
When a peer shares over uTP/UDP it uses the same port for both incoming and outgoing connections. To make it possible to differentiate between the connections, the uTP protocol extension instead uses a connection_id that is unique for every connection.
The advantage with using the same port for both incoming and outgoing connections is that it makes UDP hole punching possible.
The peer makes a separate announce to the tracker(s) for every different InfoHash.
All modern clients reuses the same incoming port for all torrents it shares.

Related

Peer to Peer Networking - with shared public IP and DHCP

I am trying to setup peer to peer networking and am trying to understand how this works.
Normally in Client to Server connection, I will connect to the server IP and port. Behind the scenes, it will create a client socket bound to a local port at the local ip, and the packet is sent to the router. The router will then NAT the local port and the local socket, to the client public ip and a different public client socket with a destination for the server IP and port.
When the server responds, the router then DENATs the public client ip and public client port back to the local ip and local port, and the packet arrives at the computer.
In a Peer to Peer networking, I may have the peer's public IP, but it is shared by many machines and the router hasn't allowed a connection yet, so there isn't a open port I can send the data to.
There was then an option that both peers contact a server. That opens a port on the router. Then the peers send packets to each other's client port.
However, usually the router will only accept packets from the same IP the request was made to, so the two peers cannot reuse the server's connection.
How do the two peers talk to each other in this scenario ?
Peer-to-peer networking works exactly the same way as client/server networking. Only one of the peers will become a server and the other a client.
Normally in a peer-to-peer app like bittorrent all peers are also servers but of course for any individual connection one machine must take the role of the client. However a single peer may have multiple connections. So for any single peer some of the connections to it will be server sockets and some will be client sockets.
How this works with NAT is exactly the same as a client/server architecture. You must configure your router to NAT back to your peer application in order for others to connect to it. If not then your peer can only connect to other peers but other peers cannot connect to you. For example, if your bittorrent client is generally acting slow, not managing to get a lot of connections and not managing to finish downloading some torrents this often signifies that you have not configured your router's port forwarding back to your PC for your bittorrent client.
For the use-case of non-expert users (consumers) there are several ways to get around NAT automatically without requiring your users to configure their routers. The most widely used method is UPnP (Universal Plug and Play). However a lot of more expert users who can configure their own routers often disable UPnP because it is a fairly well known DDoS target. So if you do decide to use UPnP you should make it optional for more advanced users to disable it if they don't want to use it.
For cases where you need a guaranteed connection regardless of router configuration then your app cannot be 100% peer-to-peer. You'd need a relay server that acts as a server to both peers that will forward the packet form the sending client peer to the receiving client peer. Of course, the disadvantage of this is that you now have a fixed cost of maintaining a server to support your app just like traditional client/server systems but in this case you're using peer-to-peer to reduce server costs, not eliminate the server.
One example of this "hybrid" approach is cryptocurrencies like Bitcoin and Ethereum. They need a core group of servers to exist in order to work. However, for these protocols the servers run the same software as the clients - they're all just nodes. The only difference is that you don't shut down the servers whereas most people quit their bitcoin wallet once they've done using it (unless they're mining). Another example that is similar is the TOR network. There is a set of core TOR nodes that act as the "server" part of the network ensuring that the network always exist.
You said it yourself: "peers send packets to each other's client port". Therefore, the router will "accept packets from the same IP the request was made to".
Say, Alice is behind router A and Bob is behind router B.
Having learned their public endpoints from a server, Alice will send UDP packets to Bob's public IP, and Bob will send UDP packets to Alice's.
Having seen Alice talk to Bob's IP, router A will accept UDP packets from Bob.
Having seen Bob talk to Alice, router B will accept UDP packets from her as well.
That is, some initial packets might be rejected as coming from the blue, but after both parties have initiated communication on their side, routers will have no reason to block what follows.
In terms of Symmetric NAT Traversal using STUN 2003, by sending a packet to Bob, Alice is creating a door for Bob in A. On the other side, by sending a packet to Alice, Bob is creating a door for Alice in B.
The trick in UDP hole punching seems to be for the routers to reuse the same NAT tunnel for different IPs - so that the port discovered by a server is the same as the port reused for direct communication.
We can talk with different IPs from a normal UDP socket (by skipping connect and using sendto), so it's kind of logical that a tunneled socket would be able to do the same.

Can a given destination port be associated with more than one TCP connection?

After doing some search for this, I got to know a few points
We cannot multiplex a port for TCP.
If two connections use the same protocol and have the same destination ports, they must have the same connection.
I am quite confused about how some sites say that TCP can only have one application listening on the same port at one time while others say multiple listening TCP sockets, all bound to the same port, can co-exist, provided they are all bound to different local IP addresses.
Reading the above stuff has left me more confused than ever. Can a destination port be associated with more than one TCP connection?
We cannot multiplex a port for TCP.
This is wrong. You can run multiple TCP connections on the same port, as long as they are unique connections. And it is not very difficult to write code that multiplexes I/O on multiple TCP sockets within the same process.
If two connections use the same protocol and have the same destination ports, they must have the same connection.
This is wrong. A TCP connection is uniquely identified by a combination of protocol + local IP/port + and remote IP/port.
Two connections that use the same protocol and same destination IP/port are still unique if they use different source IP/port to connect from. For instance, multiple clients running on the same machine can connect to the same server if they use a different local port to connect from. Which is typically the case, as most clients use a random available local port, selected by the OS, for the outbound connection.
Likewise, two connections that use the same protocol and the same source IP/port are still unique if they connect to different destination IP/port. For instance, multiple clients running on the same machine can use the same local IP/port to connect from if they connect to different servers.
some sites say that TCP can only have one application listening on the same port at one time
This is correct, but only if all of the listeners are trying to use the same local IP/port at the same time. Only 1 listener is allowed on it.
others say multiple listening TCP sockets, all bound to the same port, can co-exist, provided they are all bound to different local IP addresses.
This is correct.
Can a destination port be associated with more than one TCP connection?
Yes. Even if there is only 1 listener on that port, every connection it accepts will be using that same local port on the server side, but a different source IP/port from the client side. This allows multiple clients from different remote machines to connect to the same server at the same time.

Understanding the requisites that allow bittorrent peers to connect to each other via TCP

BitTorrent peers connect with each other via TCP (mainly). When a peer A tries to connect to peer B, does peer B also needs to try to connect with A simultaneously so the TCP 3-way handshake happens and they form a connection? If not, why?
Also, I have been studying three bittorrent client implementations. While they start TCP connections with the obtained peers, I noticed none of them opens a TCP socket to listen on the port they are announcing to the tracker. Does it mean no one can initiate connection to them? Shouldn't they create such TCP socket?
When a peer A tries to connect to peer B, does peer B also needs to try to connect with A simultaneously so the TCP 3-way handshake happens and they form a connection? If not, why?
Connection setup is a general TCP feature, not specific to bittorrent. One side initiates the connection by calling connect on an unconnected socket and the other side has a listening socket configured on which it calls accept in a loop to create create connection-specific sockets for each accepted incoming connection.
There is a simultaneous open flow for connection setup but that's rarely relevant and the connect/accept flow is used by bittorrent clients.
I noticed none of them opens a TCP socket to listen on the port they are announcing to the tracker.
They generally do and should unless process privileges are insufficient to bind a particular port or another process is already listening on it, in which case they should log a warning at least.
If you used a portscan then you may be seeing firewall or NATs getting in the way rather than the client not having a listening socket open. Instead you could use something like netstat (may need some additional arguments, depending on OS) to show listening sockets.
If they truly do not have a listening socket open then yes, that would be a problem since they could not accept incoming connections and only talk to a more limited set of clients (those that do).
Bittorrent being a peer-to-peer protocol means that clients should be equals (peers) which means they should be equally capable of initiating and accepting connections.

Understanding of WebSockets

My understanding is that a socket corresponds to a network identifier, port and TCP identifier. [1]
Operating systems enable a process to be associated with a port (which IIUC is a way of making the process addressable on the network for inbound data).
So a WebSocket server will typically be associated with a port well-known for accepting and understanding HTTP for the upgrade request (like 443) and then use TCP identifiers to enable multiple network sockets to be open concurrently for a single server process and a single port.
Please can someone confirm or correct my understanding?
[1] "To provide for unique names at
each TCP, we concatenate a NETWORK identifier, and a TCP identifier
with a port name to create a SOCKET name which will be unique
throughout all networks connected together." https://www.rfc-editor.org/rfc/rfc675
When a client connects to your server on a given port, the client connection is coming from an IP address and a client-side port number. The client-side port number is automatically generated by the client and will be unique for that client. So, you end up with four items that make a connection.
Server IP address (well known to all clients)
Server port (well known to all clients)
Client IP address (unique for that client)
Client port (dynamically unique for that client and that socket)
So, it is the combination of these four items that make a unique TCP connection. If the same client makes a second connection to the same server and port, then that second connection will have a different client port number (each connection a client makes will be given a different client port number) and thus the combination of those four items above will be different for that second client connection, allowing it's traffic to be completely separate from the first connection that client made.
So, a TCP socket is a unique combination of the four items above. To see how that is used, let's look at how some traffic flows.
After a client connects to the server and a TCP socket is created to represent that connection, then the client sends a packet. The packet is sent from the client IP address and from the unique client port number that that particular socket is using. When the server receives that packet on its own port number, it can see that the packet is coming from the client IP address and from that particular client port number. It can use these items to look up in its table and see which TCP socket this traffic is associated with and trigger an event for that particular socket. This separates that client's traffic from all the other currently connected sockets (whether they are other connections from that same client or connections from other clients).
Now, the server wants to send a response to that client. The packet is sent to the client's IP address and client port number. The client TCP stack does the same thing. It receives the packet from the server IP/port and addressed to the specific client port number and can then associate that packet with the appropriate TCP socket on the client so it can trigger an event on the right socket.
All traffic can uniquely be associated with the appropriate client or server TCP socket in this way, even though many clients may connect to the same server IP and port. The uniqueness of the client IP/port allows both ends to tell which socket a given packet belongs to.
webSocket connections start out with an HTTP connection (which is a TCP socket running the HTTP protocol). That initial HTTP request contains an "upgrade" header requesting the server to upgrade the protocol from HTTP to webSocket. If the server agrees to the upgrade, then it returns a response that indicates that the protocol will be changed to the webSocket protocol. The TCP socket remains the same, but both sides agree that they will now speak the webSocket protocol instead of the HTTP protocol. So, once connected, you then have a TCP socket where both sides are speaking the webSocket protocol. This TCP connection uses the same logic described above to remain unique from other TCP connections to the same server.
In this manner, you can have a single server on a single port that works for both HTTP connections and webSocket connections. All connections to that server start out as HTTP connections, but some are converted to webSocket connections after both sides agree to change the protocol. The HTTP connections that remain HTTP connections will be typical request/response and then the socket will be closed. The HTTP connections that are "upgraded" to the webSocket protocol will remain open for the duration of the webSocket session (which can be long lived). You can have many concurrent open webSocket connections that are all distinct from one another while new HTTP connections are regularly serviced all by the same server. The TCP logic above is used to keep track of which packets to/from the same server/port belong to which connection.
FYI, you may have heard about NAT (Network Address Translation). This is commonly used to allow private networks (like a home or corporate network) to interface to a public network (like the internet). With NAT a server may see multiple clients as having the same client IP address even though they are physically different computers on a private network). With NAT, multiple computers are routed through a common IP address, but NAT still guarantees that the client IP address and client port number are still a unique combination so the above scheme still works. When using NAT an incoming packet destined for a particular client arrives at the shared IP address. The IP/port is then translated to the actual client IP address and port number on the private network and then packet is forwarded to that device. The server is generally unaware of this translation and packet forwarding. Because the NAT server still maintains the uniqueness of the client IP/client port combination, the server's logic still works just fine even though it appears that many clients are sharing a common IP address). Note, home network routes are usually configured to use NAT since all computers on the home network will "share" the one public IP address that your router has when accessing the internet.
You will not enable multiple sockets, there is no need for it. You will have multiple conections. It's a little different, but you undesrstand well. For UDP there's nothing to do, cause there is no connections.
In TCP, if two different machines connect to the same port on a third machine, there are two distinct connections because the source IPs differ. If the same machine (or two behind NAT or otherwise sharing the same IP address) connects twice to a single remote end, the connections are differentiated by source port, the same machine cannot open 2 connections on the same port.

Using one socket for peer to peer communication

I want to write a peer to peer network application and have the following problem.
Two nodes in the network, A and B are trying to establish a connection to each other at the same time. When they both accept the connection of the other, there will be two TCP sockets opened.
Only one socket should be used for the communication between the two, because it is enough to communicate in both directions. What is an elegant solution to this problem?
Thanks!
You should not be trying to establish two simultaneous connections at the same time. That is a flaw in your p2p design. The two peers need to coordinate with each other (such as by exchanging messages via a central server that they are both connected to and knows who they both are). A decision needs to be made first about who is listening and who is connecting. One peer only opens a listening socket and that info gets sent to the other peer so it knows where to connect. If that conection fails (ie, the listening peer is behind a NAT/firewall), the peers need to be notified and a decision made to swap roles. The previously-connecting peer now opens a listening socket and that info gets sent to the previously-listening peer so it knows where to connect. If that connection fails (ie, the now-listening peer is also behind a NAT/firewall), then a direct connection between the two peers is not possible without additional help (NAT hole punching, for instance). In some situations, a direct connection is simply not possible, so data being exchanged between them would have to be proxied through the central server.