I've trying to understand the concept of channels and connections in RabbitMQ, I understand it in a high-level, a connection is a real connection implemented as TCP socket to the broker, channels are virtual connections that use the sane real connection to communicate. So channels are multiplexed through the same connection.
However in a low-level how is this implemented, TCP sockets are non-blocking? I've read that using multiple connections don't increase performance why not ? When a channel uses the connection i imagine the calls are serialized right? So wouldn't multiple connections allow me to send and receive data faster.
I know I'm missing something here so that's why I'm asking for some clarification.
Thanks.
Whether a server or client uses non-blocking sockets is an implementation detail. An implementation that needs high performance probably uses non-blocking socket; but e.g. the RabbitMQ server uses the usual lightweight Erlang process model to achieve concurrency.
You are free to use multiple AMQP connections - though in most cases you should be good with one connection and multiple channels. TCP has a relatively high overhead, and multiplexing channels on top of a TCP connection reduces this overhead.
Related
Question
Suppose I open websocket connections /A, /B and /C from one client to the same server, and messages are constantly being sent by the client on each ws connection.
If the packet carrying a message for ws connection /A times out and ends up needing to be being retransmitted in the underlying TCP layer, is there any chance that this will impact messages being transmitted by the client on ws connections /B and /C, in the way that multiple packets being sent over a single TCP connection will be delayed when an earlier one times out?
Or is there a chance that each ws connection receives its own TCP connection and so any congestion on one does not impact the packets carrying messages on the others? Could it be implementation-specific?
Overview
After an initial handshake over HTTP, Websockets send their data over a TCP/IP connection.
My assumptions regarding TCP Connections:
For a specific TCP connection on a given port, the connection makes the guarantee that all packets sent by the a sender through it will be received by the receiver, and will do so in order. It is a reliable ordered connection.
This means if two packets are sent over a TCP connection by the sender and the first one does not arrive within a given timeout, the second one is delayed until the first one has been successfully retransmitted by the sender. As far as the receiver is concerned, they always see packet one, then two.
If you create two separate TCP connections on two separate ports, then packets lost on one connection naturally have no impact on packets on the other connection. The reliable ordered guarantee only applies to the packets within one TCP connection, not all of them.
Since an active websocket connection runs on top of a TCP connection, are there any assumptions we can make when we open multiple parallel websocket connections between the same client and server?
If I have a websocket javascript client opening two or more websocket connections to the same server, does the underlying implementation use only a single TCP connection?
Is there a chance that this may be implementation-specific, or is it simply guaranteed that for a given websocket server all connections will occur on the same underlying TCP connection?
Context
The context here is a networked multiplayer game in the browser, where the desired behavior would be to have multiple parallel data streams where any timeout or latency on one stream has no impact on the packets sent on the others.
Of course when low latency is desirable for multiplayer games you generally want to use UDP instead of TCP, but there is no real cross-browser, well supported option currently for that that I am aware of. In an environment where UDP sockets are an option you would implement your own data streams with varying reliability/order guarantees on top of UDP.
However UDP-like low latency when it comes to one packet not blocking another can be achieved by guaranteeing that a TCP connection only ever has one packet in flight, and using more connections to allow in-parallel packets (source). Naturally that loses out on some optimizations, but may be desirable if latency is the primary variable being optimized for.
What is performance (I mean latency while sending all messages, maximum fan-out rate for many messages to many receivers) of ZMQ in comparison to "simple" UDP and its multicast implementation?
Assume, I have one static 'sender', which have to send messages to many,many 'receivers'. PUB/SUB pattern with simple TCP transport seems very comfortable to handle such task - ZMQ does many things without our effort, one ZMQ-socket is enough to handle even numerous connections.
But, what I am afraid is: ZMQ could create many TCP sockets in background, even if we don't "see" that. That could create latency. However, if I create "common" UDP socket and will transmit all my messages with multicast - there would be only one socket (multicast), so I think latency problem would be solved. To be honest, I would like to stay with ZMQ and PUB/SUB on TCP. Are my concerns valid?
I don't think you can really compare them in that way. It depends on what is important to you.
TCP provides reliability and you as the sender can choose if loss is more important than latency by setting your block/retry options on send.
mcast provides network bandwidth saving especially if you network has multiple segments/routers.
Other options in zeromq
Use zmq_proxy's to split/share the load of tcp connections
Use pub/sub with pgm/epgm which is just a layer over multicast (I use this)
Use the new radio dish pattern (with this you have limited subscription options)
http://api.zeromq.org/4-2:zmq-udp
Behind the scenes, a TCP "socket" is identified (simplified) by both the "source" and "destination" - so there will be a socket for each direction you're communicating with each peer (for a more full description of how a socket is set up/identified, see here and here). This has nothing to do with ZMQ, ZMQ will set up exactly as many sockets as is required by TCP. You can optimize this if you choose to use multicast, but ZMQ does not optimize this for you by using multicast behind the scenes, except for PUB/SUB see here.
Per James Harvey's answer, ZMQ has added support for UDP, so you can use that if you do not need or want the overhead of TCP, but based on what you've said, you'd like to stay with TCP, which is likely the better choice for most applications (i.e. we often unconsciously expect the reliability of TCP when we're designing our applications, and should only choose UDP if we know what we're doing).
Given that, you should assume that ZMQ is as efficient as TCP enables it to be with regards to low-level socket management. In your case, with PUB/SUB, you're already using multicast. I think you're good.
I want to create a P2P network with the following characteristics:
low latency is not really important
loosing packages is okay
the nodes would only send tiny amounts of data around
there will be no NAT/firewall issues, every node has an open port on its public ip
every node is connected to every other node
Usually I would use TCP for anything not time-critical but the last requirements causes the nodes to have lots of open connections for a long time. If I remember correctly, using TCP to connect to 1000 servers would mean I had to use 1000 ports to handle these connections. UDP on the other hand, would only require a single port for each node.
So my question is: Is TCP able to handle the above requirements in a network with e.g. 1000 nodes without tweaking the system? Would UDP be better suited in this case? Is there anything else that would be a deal-breaker for either protocol?
With UDP you control the "connection state" and it is pretty much the best way to do anything peer to peer related IF you have a high number of nodes or care about bandwidth, memory and CPU overhead. By moving all the control to your application in regards to the "connection state" of each node you minimize the amount of wasted resources by making it fit your needs exactly.
You will bypass a lot of operating system specific weirdness that limits the effectiveness of TCP with high numbers of connections. There is TIME_WAIT bloat and tens to hundreds of OS specific settings which will need tweaking for every user of your P2P app if it needs those high numbers. A test app I made which allowed you to use UDP with ack or TCP showed only a 10% difference in performance regardless of operating system using UDP. TCP performance was always lower than the best UDP and its performance varied wildly by over 600% depending upon the OS. With tweaks you can make most OS perform roughly the same using TCP but by default most are not properly tweaked.
So in my opinion it is harder to make a reliable UDP P2P network compared to TCP but it is often needed. However I would only advise that route it if you were quite experienced with networking as there are a lot of "gotchas" to deal with. There are libraries which help with this like Raknet or Enet. They provide ways to do reliable UDP but it still takes a higher amount of networking knowledge to know how this all ties in together, whereas with TCP it is mostly hidden from you.
In a peer to peer network you often have messages like NODE PINGs that you may not care if each one is always received, you just care if you have received one recently. ie You may send a ping every 10 seconds, and disconnect the node after 60 seconds of no ping. This would mean you would need 6 ping packets in a row to fail, which is highly unlikely unless the node is really down. If you received even one ping in that 60 second period then the node is still active. A TCP implementation of this would have involved more latency and bandwidth as it makes sure EACH ping message gets through and will block any other data going out until it does. And since you cannot rely on TCP to reliably tell you if a connection is dead, you are forced to add similar PING features for TCP, on top of all the other things TCP is already doing extra with your packets.
Games also often have data that if its not received by a client it is no big deal because there are more packets coming in a few milliseconds which will invalidate any missed packets. ie Player is moving from A to Z over a time span of 1 second, his client sends out each packet, roughly 40 milliseconds apart ABCDEFG__I__KLMNOPQRSTUVWXYZ Do we really care if we miss "H and J" since every 40ms we are receiving updates? Not really, this is where prediction can come into it, but this is usually not relevant to most P2P projects. If that was TCP instead of UDP then it would have increased bandwidth requirements and added latency to the rest of the packets being received as the data will be resent until it arrives, on top of the extra latency it is already adding by acking everything.
Essentially you can lower latency and network overhead for many messages in a peer to peer network using UDP. However there will always be some messages which NEED to be sent reliably and that requires you to basically implement some reliable way to get packets to that node, similar to that of TCP. And this is where you need some level of expertise if you want a reliable peer to peer network. Some things to look into include sequencing packets with a number, message ACKs, etc.
If you care a lot about efficiency or really need tens of thousands of connections then implementing your specific protocol in UDP will always be better than TCP. But there are cases to be made for TCP, like if the time to make the project matters or if you are a new to network programming.
If I remember correctly, using TCP to connect to 1000 servers would mean I had to use 1000 ports to handle these connections.
You remember wrong.
Take a web server which is listening on port 80 and can handle 1000s of connections at the same time on this single port. This is because a connection is defined by the tuple of {client-ip,client-port,server-ip,server-port}. And while server-ip and server-port are the same for all connections to this server the client-ip and client-port are not. Even if the client-ip is the same (i.e. same client) the client would pick a different source port.
... with e.g. 1000 nodes without tweaking the system?
This depends on the system since each of the open connections needs to preserve the state and thus needs memory. This might be a problem for embedded systems with only little memory.
In any case: if your protocol is just sending small messages and if packet loss, reordering or duplication are acceptable than UDP might be the better choice because the overhead (connection setup, ACK..) is smaller and it takes less memory. You could also use a single socket to exchange data with all 1000 nodes whereas with TCP you would need a separate socket for each connection (socket is not the same as port!). Using only a single socket might allow for a simpler application design.
I want to amend the answer by Steffen with a few points:
1000 connections are nothing for any normal computer and OS.
UDP fits your requirements. It might be easier to program because it is message oriented. TCP provides a stream of bytes. You need to layer a messaging protocol on top of that which is not that easy. Also, you need to handle broken TCP connections by reconnecting.
Ports are not scarce. No problem with consuming 1000 ports.
I am working on a client/server application. I have ready many articles for this and found a very common statement that "Creation/deletion of socket is very expensive process in terms of using system resources". But no where it is explained how it is consumes so much resources.
Can anybody give glimpse view on this?
Creating socket is cheap. Connecting it actually creates the connection, which is more or less as expensive as creating the underlying connection, specially TCP connection. TCP connection establish requires the three-way TCP handshake steps. Keeping connections live costs mainly memory and connections. Network connections are a resource limited by the operation systems (for example number of sockets on a port).
If you are using thread model additional thread creation resources needed.
I could find a useful like to your answer "Network Programming: to maintain sockets or not?" on Stackoverflow. And a useful article Boost socket performance on Linux
I think helpful to you.
What are the performance considerations that one should take into account when designing a server application that listens on one port? I know that it is possible for many thousands of clients to connect to a server on a single port, but is performance negatively affected by having the server application accept all incoming requests on a single port?
Is my understanding correct that this model (server listening on one port which handles incoming connections, and then responds back over the outbound connection that was created when the client connection was established) is the way databases/webservers etc work?
Regards,
Brian
It does not matter if the server listens on one port or multiple ports. The server still has to establish a full socket connection for each client it accepts. The OS still has to route inbound packets to the correct socket endpoints, and sockets are uniquely identified by the combination of IP/Port pairs of both endpoints, so there is no performance issues if the server endpoints use different ports.
Any performance issues are going to be in the way the server's code handles those socket connections. If it only listens on one port, and accepts clients on that port using a simple accept() loop in a single thread, then the rate that it can accept clients is limited by that loop. Typically, servers spawn worker threads for each accepted client, which in itself has performance overhead of its own if thread pooling is not used. If the server needs to handle a lot of clients simultaneously, then it should use overlapped I/O or I/O Completion Ports to handle the connections more efficiently.