Is there a performance difference between pooling connections or channels in rabbitmq? - queue

I'm a newbie with Rabbitmq(and programming) so sorry in advance if this is obvious. I am creating a pool to share between threads that are working on a queue but I'm not sure if I should use connections or channels in the pool.
I know I need channels to do the actual work but is there a performance benefit of having one channel per connection(in terms of more throughput from the queue)? or am I better off just using a single connection per application and pool many channels?
note: because I'm pooling the resources the initial cost is not a factor, as I know connections are more expensive than channels. I'm more interested in throughput.

I have found this on the rabbitmq website it is near the bottom so I have quoted the relevant part below.
The tl;dr version is that you should have 1 connection per application and 1 channel per thread.
Connections
AMQP connections are typically long-lived. AMQP is an application
level protocol that uses TCP for reliable delivery. AMQP connections
use authentication and can be protected using TLS (SSL). When an
application no longer needs to be connected to an AMQP broker, it
should gracefully close the AMQP connection instead of abruptly
closing the underlying TCP connection.
Channels
Some applications need multiple connections to an AMQP broker.
However, it is undesirable to keep many TCP connections open at the
same time because doing so consumes system resources and makes it more
difficult to configure firewalls. AMQP 0-9-1 connections are
multiplexed with channels that can be thought of as "lightweight
connections that share a single TCP connection".
For applications that use multiple threads/processes for processing,
it is very common to open a new channel per thread/process and not
share channels between them.
Communication on a particular channel is completely separate from
communication on another channel, therefore every AMQP method also
carries a channel number that clients use to figure out which channel
the method is for (and thus, which event handler needs to be invoked,
for example).
It is advised that there is 1 channel per thread, even though they are thread safe, so you could have multiple threads sending through one channel. In terms of your application I would suggest that you stick with 1 channel per thread though.
Additionally it is advised to only have 1 consumer per channel.
These are only guidelines so you will have to do some testing to see what works best for you.
This thread has some insights here and here.
Despite all these guidelines this post suggests that it will most likely not affect performance by having multiple connections. Though it is not specific whether it is talking about client side or server(rabbitmq) side. With the one point that it will of course use more systems resources with more connections. If this is not a problem and you wish to have more throughput it may indeed be better to have multiple connections as this post suggests multiple connections will allow you more throughput. The reason seems to be that even if there are multiple channels only one message goes through the connection at one time. Therefore a large message will block the whole connection or many unimportant messages on one channel may block an important message on the same connection but a different channel. Again resources are an issue. If you are using up all the bandwidth with one connection then adding an additional connection will have no increase performance over having two channels on the one connection. Also each connection will use more memory, cpu and filehandles, but that may well not be a concern though might be an issue when scaling.

In addition to the accepted answer:
If you have a cluster of RabbitMQ nodes with either a load-balancer in front, or a short-lived DNS (making it possible to connect to a different rabbit node each time), then a single, long-lived connection would mean that one application node works exclusively with a single RabbitMQ node. This may lead to one RabbitMQ node being more heavily utilized than the others.
The other concern mentioned above is that the publishing and consuming are blocking operations, which leads to queueing messages. Having more connections will ensure that 1. processing time for each messages doesn't block other messages 2. big messages aren't blocking other messages.
That's why it's worth considering having a small connection pool (having in mind the resource concerns raised above)

The "one channel per thread" might be a safe assumption (I say might as I have not made any research by myself and I have no reason to doubt the documentation :) ) but beware that there is a case where this breaks:
If you you use RPC with RabbitMQ Direct reply-to then you cannot reuse the same channel to consume for another RPC request. I asked for details about that in the google user group and the answer I got from Michael Klishin (who seems to be actively involved in RabbitMQ development) was that
Direct Reply to is not meant to be used with channel sharing either way.
I've email Pivotal to update their documentation to explain how amq.rabbitmq.reply-to is working under the hood and I'm still waiting for an answer (or an update).
So if you want to stick to "one channel per thread" beware as this will not work good with Direct reply-to.

Related

.Net 4.5 TCP Server scale to thousands of connected clients

I need to build a TCP server using C# .NET 4.5+, it must be capable of comfortably handling at least 3,000 connected clients that will be send messages every 10 seconds and with a message size from 250 to 500 bytes.
The data will be offloaded to another process or queue for batch processing and logging.
I also need to be able to select an existing client to send and receive messages (greater then 500 bytes) messages within a windows forms application.
I have not built an application like this before so my knowledge is based on the various questions, examples and documentation that I have found online.
My conclusion is:
non-blocking async is the way to go. Stay away from creating multiple threads and blocking IO.
SocketAsyncEventArgs - Is complex and really only needed for very large systems, BTW what constitutes a very large system? :-)
BeginXXX methods will suffice (EAP).
Using TAP I can simplify 3. by using Task.Factory.FromAsync, but it only produces the same outcome.
Use a global collection to keep track of the connected tcp clients
What I am unsure about:
Should I use a ManualResetEvent when interacting with the TCP Client collection? I presume the asyc events will need to lock access to this collection.
Best way to detect a disconnected client after I have called BeginReceive. I've found the call is stuck waiting for a response so this needs to be cleaned up.
Sending messages to a specific TCP Client. I'm thinking function in custom TCP session class to send a message. Again in an async model, would I need to create a timer based process that inspects a message queue or would I create an event on a TCP Session class that has access to the TcpClient and associated stream? Really interested in opinions here.
I'd like to use a thread for the entire service and use non-blocking principals within, are there anythings I should be mindful of espcially in context of 1. ManualResetEvent etc..
Thank you for reading. I am keen to hear constructive thoughts and or links to best practices/examples. It's been a while since I've coded in c# so apologies if some of my questions are obvious. Tasks, async/await are new to me! :-)
I need to build a TCP server using C# .NET 4.5+
Well, the first thing to determine is whether it has to be base-bones TCP/IP. If you possibly can, write one that uses a higher-level abstraction, like SignalR or WebAPI. If you can write one using WebSockets (SignalR), then do that and never look back.
Your conclusions sound pretty good. Just a few notes:
SocketAsyncEventArgs - Is complex and really only needed for very large systems, BTW what constitutes a very large system? :-)
It's not so much a "large" system in the terms of number of connections. It's more a question of how much traffic is in the system - the number of reads/writes per second.
The only thing that SocketAsyncEventArgs does is make your I/O structures reusable. The Begin*/End* (APM) APIs will create a new IAsyncResult for each I/O operation, and this can cause pressure on the garbage collector. SocketAsyncEventArgs is essentially the same as IAsyncResult, only it's reusable. Note that there are some examples on the 'net that use the SocketAsyncEventArgs APIs without reusing the SocketAsyncEventArgs structures, which is completely ridiculous.
And there's no guidelines here: heavier hardware will be able to use the APM APIs for much more traffic. As a general rule, you should build a barebones APM server and load test it first, and only move to SAEA if it doesn't work on your target server's hardware.
On to the questions:
Should I use a ManualResetEvent when interacting with the TCP Client collection? I presume the asyc events will need to lock access to this collection.
If you're using TAP-based wrappers, then await will resume on a captured context by default. I explain this in my blog post on async/await.
There are a couple of approaches you can take here. I have successfully written a reliable and performant single-threaded TCP/IP server; the equivalent for modern code would be to use something like my AsyncContextThread class. It provides a context that will cause await to resume on that same thread by default.
The nice thing about single-threaded servers is that there's only one thread, so no synchronization or coordination is necessary. However, I'm not sure how well a single-threaded server would scale. You may want to give that a try and see how much load it can take.
If you do find you need multiple threads, then you can just use async methods on the thread pool; await will not have a captured context and so will resume on a thread pool thread. In this case, yes, you'd need to coordinate access to any shared data structures including your TCP client collection.
Note that SignalR will handle all of this for you. :)
Best way to detect a disconnected client after I have called BeginReceive. I've found the call is stuck waiting for a response so this needs to be cleaned up.
This is the half-open problem, which I discuss in detail on my blog. The best way (IMO) to solve this is to periodically send a "noop" keepalive message to each client.
If modifying the protocol isn't possible, then the next-best solution is to just close the connection after a no-communication timeout. This is how HTTP "persistent"/"keep-alive" connections decide to close. There's another possibile solution (changing the keepalive packet settings on the socket), but it's not as easy (requires p/Invoke) and has other problems (not always respected by routers, not supported by all OS TCP/IP stacks, etc).
Oh, and SignalR will handle this for you. :)
Sending messages to a specific TCP Client. I'm thinking function in custom TCP session class to send a message. Again in an async model, would I need to create a timer based process that inspects a message queue or would I create an event on a TCP Session class that has access to the TcpClient and associated stream? Really interested in opinions here.
If your server can send messages to any client (i.e., it's not just a request/response protocol; any part of the server can send messages to any client without the client requesting an update), then yes, you'll need a proper queue of outgoing requests because you can't (reliably) issue multiple concurrent writes on a socket. I wouldn't have the consumer be timer-based, though; there are async-compatible producer/consumer queues available (like BufferBlock<T> from TPL Dataflow, and it's not that hard to write one if you have async-compatible locks and condition variables).
Oh, and SignalR will handle this for you. :)
I'd like to use a thread for the entire service and use non-blocking principals within, are there anythings I should be mindful of espcially in context of 1. ManualResetEvent etc..
If your entire service is single-threaded, then you shouldn't need any coordination primitives at all. However, if you do use the thread pool instead of syncing back to the main thread (for scalability reasons), then you will need to coordinate. I have a coordination primitives library that you may find useful because its types have both synchronous and asynchronous APIs. This allows, e.g., one method to block on a lock while another method wants to asynchronously block on a lock.
You may have noticed a recurring theme around SignalR. Use it if you possibly can! If you have to write a bare-bones TCP/IP server and can't use SignalR, then take your initial time estimate and triple it. Seriously. Then you can get started down the path of painful TCP with my TCP/IP FAQ blog series.

Does listen() backlog affect established TCP connections?

Would it be naive to create a TCP socket with a listen backlog set to minimum as a way of rate limiting new incoming connections? The server workload in question doesn't expect many new connections at any time but spends a lot of time servicing long open persistent connections. It appears that new incoming connections shouldn't affect established connections, though I've been unable to find any definitive answer in any text. Is it possible for failed new incoming connections to create some kind of TCP traffic congestion on the server with the packets it's receiving or are they dropped fast enough that it has no effect on any buffers or other part of the network stack?
Specifically the platform in use is Linux, and although it may be handled differently in different OSs, I expect them to all behave roughly the same.
EDIT What I mean by the "same" is that backlog doesn't affect established connections, though I do understand Linux discards them while Windows sends a reset.
Does listen() backlog affect established TCP connections?
It affects established connections that the server hasn't accepted yet via accept(), only in the sense that it limits the number of such connections that can exist.
Would it be naive to create a TCP socket with a listen backlog set to minimum as a way of rate limiting new incoming connections?
All it would accomplish would be to unnecessarily fail some connecting clients. They won't get any service until your server gets around to it anyway, and once the backlog queue fills they are rate-limited by your service code anyway. There is no particular reason why shortening the queue would have any beneficial effect. The other problem with the idea is that it isn't readily possible to determine what the minimum actually is, or whether you succeeded in setting it as the backlog queue length.
It appears that new incoming connections shouldn't affect established connections, though I've been unable to find any definitive answer in any text.
That is correct. There is no reason why it should affect them: that's why you won't find it written down anywhere, any more than the fact that the phase of the moon doesn't affect it either.
Is it possible for failed new incoming connections to create some kind of TCP traffic congestion on the server with the packets it's receiving
No.
or are they dropped fast enough that it has no effect on any buffers or other part of the network stack?
They're not dropped. They simply aren't even created if they won't fit on the backlog queue. Ergo their resource consumption at the server is zero.
Specifically the platform in use is Linux, and although it may be handled differently in different OSs, I expect them to all behave roughly the same.
They don't. On Windows, an incoming connection when the backlog queue is full causes an RST to be issued. On other platforms it is simply ignored.
What you describe are several types of attacks like flooding, syn attacks and other goodies resulting in denial of service.
This topic is not easy, because protection has to be implemented in all the layers, including TCP. For instance a SYN attack, fiddling with the sequence numbers, ... . At that point the packet in question already came a long way, through the ethernet layer and ip layer, bottom line it is taking resources. So if your system is under attack, the attacking packets are in your data stream just like the good ones are. The faster you can detect a packet is faulty and drop it, the better. Usually a system that is under attack will be slower. Well at least the systems that I have worked with.
Some attacks try to bring your system in a faulty state permanently, this by exploiting bugs. For instance TCP has a receive queue, if packets are constantly arriving out of order they will be stored in that receive queue. If the missing packet never arrives, then this receive queue could keep on growing and growing. Without the proper defense , this would lead to the system going completely out of resources.
There are specialised tools (codenumicon for instance) to check the vulnerability of a TCP stack implementation. You can assume that the one on linux has been properly tested using similar tools.
An attack can also occur on the application layer. If you have a TCP server and it allows only a limited amount of sessions. A malicious user can simply take all the connections simply by establishing all the connections and then not doing anything with it. So you have to create some defense as well. Weather or not you set this limit very low or high does not change a thing. A malicious user will try anything to bring your system down. You need to built in defense anyway. You can connect to a webserver (HTTP) simply using telnet. If you don't send anything the server's defense will come into play and close the connection.
So bringing the amount of possible connections to a low value and thinking that this in itself is a form of protection is indeed naive.
Is it possible for failed new incoming connections to create some kind of TCP traffic congestion on the server with the packets it's receiving or are they dropped fast enough that it has no effect on any buffers or other part of the network stack?
They are using resources of your machine and will make your system run slower.
It appears that new incoming connections shouldn't affect established connections, though I've been unable to find any definitive answer in any text.
If it is normal user trying to establish a connection, even if he is doing it continuously, retrying upon failure. The influence will be minimal, close to nothing. But a malicious user that is flooding connections attempts will have influence on the system performance, because the system has to spent time identifying those flawed packets and dropping them asap.

server push for millions of concurrent connections

I am building a distributed system that consists of potentially millions of clients which all need to keep an open (preferrably HTTP) connection to wait for a command from the server (which is running somewhere else). The load of messages / commmands will not be very high, maybe one message / sec / 1000 clients which means it would be 1000 msg/sec # 1 million clients. => it's basically about the concurrent connections.
The requirements are simple too. One way messaging (server->client), only 1 client per "channel".
I am pretty open in terms of technology (xmpp / websockets / comet / ...). I am using Google App Engine as server, but their "channels" won't work for me unfortunately (too low quotas and no Java client). XMPP was an option but is quite expensive. So far I was using URL Fetch & pubnub, but they just started charging for connections (big time).
So:
Does anyone know of a service out there that can do that for me in an affordable way? Most I have found restrict or heavily charge for connections.
Any experience with implementing such a server yourself? I have actually done that already and it works pretty well (based on Tomcat & NIO) but I haven't had the time yet to actually set up a large load test environment (partially because this is still a fallback solution, I'd prefer a battle hardened msg server). Any experience to how many users you get per GB? Any hard limits?
My architecture also allows to fragment the msg servers, but I'd like to maximize the concurrent connections because the msg processing CPU overhead is minimal.
I have meanwhile implemented my own message server using netty.io. Netty makes use of Java NIO and scales extremely well. For idle connections I get a memory footprint of 500 bytes per connection. I am doing only very simple message forwarding (no caching, storage or other fancy stuff) but with that am easily getting 1000 - 1500 msg / sec (each half a KB) on the small amazon instance (1ECU / 1.6GB).
Otherwise if you are looking for a (paid) service then I can recommend spire.io (they do not charge for connections but have a higher price per message) or pubnub (they do charge for connections but are cheaper per message).
You have to look more in architecture of making such environment.
First of all, if you will write sockets management by yourself, then don't use Thread per Client Socket. Use Asynchronous methods for receiving and sending data.
WebSockets might be too heavy if your messages are small. Because it implements framing, which has to be applied to each message for each socket individually (caching can be used for different versions of WebSockets protocols), that makes them slower to process both directions: for receive and for send, especially because of data masking.
It is possible to create millions of sockets, but only most advanced technologies are capable to do so. Erlang is able to handle millions connections, and is pretty scalable.
If you would like to have millions of connections using other higher level technologies, then you need to think about clustering of what you are trying to accomplish.
For example using gateway server that will keep track of all processing servers. And have data of them (IP, ports, load (if it will be one internal network, firewalling and port forwarding might be handy here).
Client software connects to that gateway server, gateway server checks the least loaded server and sends ip and port to client. Client creates connection directly to working server using provided address.
That way you will have gateway which as well can handle authorization, and wont hold connections for long, so one of them might be enough. And many workers that are doing publishing of data and keeping connections.
This is very related to your needs, and might not be suitable for your solutions.

Heartbeat Protocols/Algorithms or best practices

Recently I've added some load-balancing capabilities to a piece of software that I wrote. It is a networked application that does some data crunching based on input coming from a SQL database. Since the crunching can be pretty intensive I've added the capability to have multiple instances of this application running on different servers to split the load but as it is now the load balancing is a manual act. A user must specify which instances take which portion of the input domain.
I would like to take that to the next level and program the instances to automatically negotiate the diving up of the input data and to recognize if one of them "disappears" (has crashed or has been powered down) so that the remaining instances can take on the failed instance's workload.
In order to implement this I'm considering using a simple heartbeat protocol between the instances to determine who's online and who isn't and while this is not terribly complicated I'd like to know if there are any established heartbeat network protocols (based on UDP, TCP or both).
Obviously this happens a lot in the networking world with clustering, fail-over and high-availability technologies so I guess in the end I'd like to know if maybe there are any established protocols or algorithms that I should be aware of or implement.
EDIT
It seems, based on the answers, that either there are no well established heart-beat protocols or that nobody knows about them (which would imply that they aren't so well established after all) in which case I'm just going to roll my own.
While none of the answers offered what I was looking for specifically I'm going to vote for Matt Davis's answer since it was the closest and he pointed out a good idea to use multicast.
Thank you all for your time~
Distribued Interactive Simulation (DIS), which is defined under IEEE Standard 1278, uses a default heartbeat of 5 seconds via UDP broadcast. A DIS heartbeat is essentially an Entity State PDU, which fully defines the state, including the position, of the given entity. Due to its application within the simulation community, DIS also uses a concept referred to as dead-reckoning to provide higher frequency heartbeats when the actual position, for example, is outside a given threshold of its predicted position.
In your case, a DIS Entity State PDU would be overkill. I only mention it to make note of the fact that heartbeats can vary in frequency depending on the circumstances. I don't know that you'd need something like this for the application you described, but you never know.
For heartbeats, use UDP, not TCP. A heartbeat is, by nature, a connectionless contrivance, so it goes that UDP (connectionless) is more relevant here than TCP (connection-oriented).
The thing to keep in mind about UDP broadcasts is that a broadcast message is confined to the broadcast domain. In short, if you have computers that are separated by a layer 3 device, e.g., a router, then broadcasts are not going to work because the router will not transmit broadcast messages from one broadcast domain to another. In this case, I would recommend using multicast since it will span the broadcast domains, providing the time-to-live (TTL) value is set high enough. It's also a more automated approach than directed unicast, which would require the sender to know the IP address of the receiver in order to send the message.
Broadcast a heartbeat every t using UDP; if you haven't heard from a machine in more than k*t, then it's assumed down. Be careful that the aggregate bandwidth used isn't a drain on resources. You can use IP broadcast addresses, or keep a list of specific IPs you're doing work for.
Make sure the heartbeat includes a "reboot count" as well as "machine ID" so that you know previous server state isn't around.
I'd recommend using MapReduce if it fits. It would save a lot of work.
I'm not sure this will answer the question but you might be interested by the way Weblogic Server clustering work under the hood. From the book Mastering BEA WebLogic Server:
[...] WebLogic Server clustering provides a loose coupling of the servers in the cluster. Each server in the cluster is independent and does not rely on any other server for any fundamental operations. Even if contact with every other server is lost, each server will continue to run and be able to process the requests it receives. Each server in the cluster maintains its own list of other servers in the cluster through periodic heartbeat messages. Every 10 seconds, each server sends a heartbeat message to the other servers in the cluster to let them know it is still alive. Heartbeat messages are sent using IP multicast technology built into the JVM, making this mechanism efficient and scalable as the number of servers in the cluster gets large. Each server receives these heartbeat messages from other servers and uses them to maintain its current cluster membership list. If a server misses receiving three heartbeat messages in a row from any other server, it takes that server out of its membership list until it receives another heartbeat message from that server. This heartbeat technology allows servers to be dynamically added and dropped from the cluster with no impact on the existing servers’ configurations.
Cisco content switches are a hardware solution for this problem. They implement a virtual IP address as a front end to multiple real servers, whose real IP addresses are known to the switch. The switch periodically sends HTTP HEAD requests to the web servers, to verify they are still running (which the switch software calls a "keepalive", although this doesn't keep the server itself alive). The Cisco switch accepts traffic on the virtual IP and forwards it to the actual web servers, using configurable load balancing such as round-robin, or user-defined load balancing.
These switches retail in the $3-10K range, although my business partner picked one up on eBay for about $300 a year ago. If you can afford one, they do represent a proven hardware solution to the question of how to have a service spread transparently across multiple servers. Redhat includes a built-in port configuration so that you could implement your own Cisco switch using a cheap RedHat box. Google for "virtual ip address" and "cisco content router" for more information.
In addition to trying hardware load-balancers, you can also try a free-open-source load-balancing software application such as HAProxy, available for Linux and the BSDs.

Scaleability of TCP keep-alive

Consider a large scale, heterogeneous network of various devices. These devices are providing services to others on the network in a peer-to-peer fashion. The mechanism used to track service availabilty across all nodes is currently using TCP sockets marked as keep-alive, usually for the duration the node is online. This leads to every node having a socket open with every other node (within a subnet of the peer-to-peer infrastructure).
What arguments exist regarding the scaleability of using TCP keep-alive in this way?
My alternative approach is to use a publish/subscribe model, where nodes push new services to the network as they become available, and their peers cache them for when they want to subscribe to a service. Does this sound feasable?
I interpret from what you wrote that the communication is strictly point-to-point, with considerable duration ('leases'). If this is true, it means that you will gain nothing in a publish-subscribe model. If this is not true, then yes, you should (could) change the network model to match the communications, and your idea sounds sound.
Regarding your second question, since TCP sockets and keep-alive is just a concept, there is no (or a very small) intrinsic cost of having such a keep-alive contract. In practice YMMV since different socket implementations require different resources, and other actions might be required to keep the channel open. There are however many implementations which require very little resources for open sockets (select()-type for example).
A discovery service (publish/subscribe of services) makes most sense if there are many implementers of the same type of service, and you cannot (or do not want to) predict statically where they will appear.
In short, I would say that you should only change the design if the type of communication that you have fits the current architecture badly. Your idea certainly sounds very feasible, but more information about the communication patterns would be necessary to make an estimation of the outcome.
Yeah using keep alive seems like a bad idea for any P2P network. Not only would I only have connections kept open while data is being transferred I would also keep node state updates on a different sockets altogether so as to not interfere with file transmissions.
If your TCP Keep Alive mechanism is being used only for tracking service availability (meaning, you never communicate service request/response across these connections), the use of TCP sockets is definitely an over kill. TCP sockets do take significant resources.
A more scalable method could be using a publish/subscribe model that uses UDP publish messages at regular intervals to advertise continued existence of the service. You could also use a service-down message published from a disconnecting node to gracefully declare end of service.
Going further, if you mean to get optimal with really large scale networks and, are ready to put in some time and effort -- consider a structured P2P mechanism like DHT.