Should I use separate connections for Pub and Sub with Redis? - event-handling

I have noticed that Socket.io is using two separate connections for Pub and Sub to Redis server. Is it something that could improve the performance? Or is it just purely a move towards more organized event handlers and code? What are the benefits and drawbacks of the two separate connections and one single connections for publishing and subscribing.
P.S. The system is pushing about an equal number of messages that it is receiving. It pushes updates to the servers, which are on the same level in the hierarchy, so there is no master, pushing all of the updates, or slave, consuming the messages. One server would have about 4-8 subscriptions and it will send the messages back to these servers.
P.S.S. Is this more of a job for a purpose-built job queue? The reason I am looking at Redis. is that I am already keeping some shared objects in it, which are used by all servers. Is message queue worth adding yet another network connection?

You are required to use two connections for pub and sub. A subscriber connection cannot issue any commands other than subscribe, psubscribe, unsubscribe, punsubscribe (although #Antirez has hinted of a subscriber-safe ping in the future). If you try to do anything else, redis tells you:
-ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / QUIT allowed in this context
(note that you can't test this with redis-cli, since that understands the protocol well enough to prevent you from issuing commands once you have subscribed - but any other basic socket tool should work fine)
This is because subscriber connections work very differently - rather than working on a request/response basis, incoming messages can now come in at any time, unsolicited.
publish is a regular request/response command, so must be sent on a regular connection, not a subscriber connection.

Related

ZeroMQ mixed PUB/SUB DEALER/ROUTER pattern

I need to do the following:
multiple clients connecting to the SAME remote port
each of the clients open 2 different sockets, one is a PUB/SUB, the
other is a ROUTER/DEALER ( the server can occasionally send back to client heartbeats, different server related information ).
I am completely lost whether it can be done in ZeroMQ or not.
Obviously if I can use 2 remote ports, that is not an issue, but I fail
to understand if my setup can be achieved with some kind of envelope
usage in ZeroMQ.
Can it be done?
Thanks,
Update:
To clarify what I wish to achieve.
Multiple clients can communicate with the server
Clients operate on request-response basis mostly(on one socket)
Clients create a session socket, which means that whenever this
type of socket is created, a separate worker thread needs to be created
and from that time on the client communicates with this worker thread
with regards to requests processing, e.g. server thread must not block
the connection of other clients when dealing with the request of one client
However clients can receive occasional messages from the worker thread with regards to heartbeats of the worker.
Update2:
Actually I could sort it out. What I did:
identify clients obviously, so ROUTER/DEALER is used, e.g. clients
are indeed dealers, hence async processing is provided
clients send messages to the one and only local port, where the router sits
router peeks into messages (kinda the lazy pirate example), checks whether a new client comes in; if yes it offloads to a separate thread, and connects the separate thread with an internal "inproc:" socket
router obviously polls for the frontend and all connected clients' backends and sends messages back and forth.
What bugs me is that it is an overkill if I compare this with a "regular" socket solution, where I could have connected the client with the worker thread DIRECTLY (e.g. worker thread could recv from the socket opened by the client directly), hence I could spare the routing completely.
What am I missing?
There was a discussion on the ZeroMQ mailing list recently about multiplexing multiple services on one TCP socket. The proposed solutions is essentially what you implemented.
The discussion also mentions Malamute with its brokers which essentially provides a framework based on ZeroMQ which also provides the functionality you need. I haven't had the time to look into it myself, but it looks promising.

How to prevent sending same data to different clients in REST api GET?

I have 15 worker clients and one master connected through internet. Job & data are been passed through REST api in json format.
Jobs are not restricted to any particular client. Any worker can query for the available job in regular interval(say 30 seconds), process it and will update the status.
In this scenario, how can I prevent same records been sent to different clients while GET request.
Followings are my solution approach to overcome this issue:
Take top 5 unprocessed records from the database and make it as SENT and expose via REST GET.
But the problem is, it creates inconsistency. Some times, the client doesn't got data due to network connectivity issue. But in server, it will be marked as SENT. So, no other clients can get that data. It will remain as SENT forever.
Get the list from server, and reply back the list of job IDs to Server as received. But in-between this time gap, some other clients also getting same set of Jobs.
You've stumbled upon a fundamental problem in distributed systems: there is no way to know if the other side received your message. You can certainly improve the situation with TCP and ack messages. But if you never get the ACK did the message never arrive, did it arrive but the recipient die before processesing, or did the recipient send he ACK and the ACK get dropped?
That means you need to design your system to handle receiving data more than once.
You offer two partial solutions; if you combine them, your solution starts to look like how SQS works. Mark the item as pending_ack with a timestamp. After client replies, it is marked sent. Any pending_ackss past a certain time period are eligible to be resent.
Pick your time period to allow for slow network and slow clients and it boils down to only sending duplicates when you really don't know if the client died or not.
Maybe you should reconsider the approach to blocking resources. REST architecture - by definition is not obliged to save information about client. Instead, you may want to consider optimistic concurrency control (http://en.wikipedia.org/wiki/Optimistic_concurrency_control).

Topics in ZeroMQ REP sockets

When using ØMQ socket of type SUB, one may use
sub_socket.setsockopt_string(zmq.SUBSCRIBE, 'topic')
Is the same possible also with REP sockets, allowing a worker to only handle specific topics, leaving other topics to different workers?
I'm very afraid that it is impossible, quoting http://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/patterns/pubsub.html:
In the current versions of ØMQ, filtering happens at the subscriber side, not the publisher side.
But still, I'm asking if there is some trick to achieve that, because such a functionality would have a huge impact on my infrastructure.
Nope. Can I assume that you've got a REQ or DEALER server socket that sends work to REP workers, that then respond with the completed work back to the server? And that you're looking for a way to make your server communicate to specific clients rather than just pass out tasks in a round-robin fashion?
Can't do it. See here, those sockets are only, always, round-robin. If you want to communicate to a specific client, you must either have a socket that talks only to that client, or you must start the communication from the client (switch your socket pairing so the worker requests whatever work its ready for, and the server responds with it, and then the worker creates a new request with the completed work). Doing anything else gets much more complicated.

Is there a performance difference between pooling connections or channels in rabbitmq?

I'm a newbie with Rabbitmq(and programming) so sorry in advance if this is obvious. I am creating a pool to share between threads that are working on a queue but I'm not sure if I should use connections or channels in the pool.
I know I need channels to do the actual work but is there a performance benefit of having one channel per connection(in terms of more throughput from the queue)? or am I better off just using a single connection per application and pool many channels?
note: because I'm pooling the resources the initial cost is not a factor, as I know connections are more expensive than channels. I'm more interested in throughput.
I have found this on the rabbitmq website it is near the bottom so I have quoted the relevant part below.
The tl;dr version is that you should have 1 connection per application and 1 channel per thread.
Connections
AMQP connections are typically long-lived. AMQP is an application
level protocol that uses TCP for reliable delivery. AMQP connections
use authentication and can be protected using TLS (SSL). When an
application no longer needs to be connected to an AMQP broker, it
should gracefully close the AMQP connection instead of abruptly
closing the underlying TCP connection.
Channels
Some applications need multiple connections to an AMQP broker.
However, it is undesirable to keep many TCP connections open at the
same time because doing so consumes system resources and makes it more
difficult to configure firewalls. AMQP 0-9-1 connections are
multiplexed with channels that can be thought of as "lightweight
connections that share a single TCP connection".
For applications that use multiple threads/processes for processing,
it is very common to open a new channel per thread/process and not
share channels between them.
Communication on a particular channel is completely separate from
communication on another channel, therefore every AMQP method also
carries a channel number that clients use to figure out which channel
the method is for (and thus, which event handler needs to be invoked,
for example).
It is advised that there is 1 channel per thread, even though they are thread safe, so you could have multiple threads sending through one channel. In terms of your application I would suggest that you stick with 1 channel per thread though.
Additionally it is advised to only have 1 consumer per channel.
These are only guidelines so you will have to do some testing to see what works best for you.
This thread has some insights here and here.
Despite all these guidelines this post suggests that it will most likely not affect performance by having multiple connections. Though it is not specific whether it is talking about client side or server(rabbitmq) side. With the one point that it will of course use more systems resources with more connections. If this is not a problem and you wish to have more throughput it may indeed be better to have multiple connections as this post suggests multiple connections will allow you more throughput. The reason seems to be that even if there are multiple channels only one message goes through the connection at one time. Therefore a large message will block the whole connection or many unimportant messages on one channel may block an important message on the same connection but a different channel. Again resources are an issue. If you are using up all the bandwidth with one connection then adding an additional connection will have no increase performance over having two channels on the one connection. Also each connection will use more memory, cpu and filehandles, but that may well not be a concern though might be an issue when scaling.
In addition to the accepted answer:
If you have a cluster of RabbitMQ nodes with either a load-balancer in front, or a short-lived DNS (making it possible to connect to a different rabbit node each time), then a single, long-lived connection would mean that one application node works exclusively with a single RabbitMQ node. This may lead to one RabbitMQ node being more heavily utilized than the others.
The other concern mentioned above is that the publishing and consuming are blocking operations, which leads to queueing messages. Having more connections will ensure that 1. processing time for each messages doesn't block other messages 2. big messages aren't blocking other messages.
That's why it's worth considering having a small connection pool (having in mind the resource concerns raised above)
The "one channel per thread" might be a safe assumption (I say might as I have not made any research by myself and I have no reason to doubt the documentation :) ) but beware that there is a case where this breaks:
If you you use RPC with RabbitMQ Direct reply-to then you cannot reuse the same channel to consume for another RPC request. I asked for details about that in the google user group and the answer I got from Michael Klishin (who seems to be actively involved in RabbitMQ development) was that
Direct Reply to is not meant to be used with channel sharing either way.
I've email Pivotal to update their documentation to explain how amq.rabbitmq.reply-to is working under the hood and I'm still waiting for an answer (or an update).
So if you want to stick to "one channel per thread" beware as this will not work good with Direct reply-to.

What is Microsoft Message Queuing (MSMQ)? How does it work?

I need to work with MSMQ (Microsoft Message Queuing). What is it, what is it for, how does it work? How is it different from web services?
With all due respect to #Juan's answer, both are ways of exchanging data between two disconnected processes, i.e. interprocess communication channels (IPC). Message queues are asynchronous, while webservices are synchronous. They use different protocols and back-end services to do this so they are completely different in implementation, but similar in purpose.
You would want to use message queues when there is a possibility that the other communicating process may not be available, yet you still want to have the message sent at the time of the client's choosing. Delivery will occur the when process on the other end wakes up and receives notification of the message's arrival.
As its name states, it's just a queue manager.
You can Send objects (serialized) to the queue where they will stay until you Receive them.
It's normally used to send messages or objects between applications in a decoupled way
It has nothing to do with webservices, they are two different things
Info on MSMQ:
https://msdn.microsoft.com/en-us/library/ms711472(v=vs.85).aspx
Info on WebServices:
http://msdn.microsoft.com/en-us/library/ms972326.aspx
Transactional Queue Management 101
A transactional queue is a middleware system that asynchronously routes messages of one sort of another between hosts that may or may not be connected at any given time. This means that it must also be capable of persisting the message somewhere. Examples of such systems are MSMQ and IBM MQ
A Transactional Queue can also participate in a distributed transaction, and a rollback can trigger the disposal of messages. This means that a message is guaranteed to be delivered with at-most-once semantics or guaranteed delivery if not rolled back. The message won't be delivered if:
Host A posts the message but Host B
is not connected
Something (possibly but not
necessarily initiated from Host A)
rolls back the transaction
B connects after the transaction is
rolled back
In this case B will never be aware the message even existed unless informed through some other medium. If the transaction was rolled back, this probably doesn't matter. If B connects and collects the message before the transaction is rolled back, the rollback will also reverse the effects of the message on B.
Note that A can post the message to the queue with the guarantee of at-most-once delivery. If the transaction is committed Host A can assume that the message has been delivered by the reliable transport medium. If the transaction is rolled back, Host A can assume that any effects of the message have been reversed.
Web Services
A web service is remote procedure call or other service (e.g. RESTFul API's) published by a (typically) HTTP Server. It is a synchronous request/response protocol and has no guarantee of delivery built into the protocol. It is up to the client to validate that the service has been correctly run. Typically this will be through a reply to the request or timeout of the call.
In the latter case, web services do not guarantee at-most-once semantics. The server can complete the service and fail to deliver a response (possibly through something outside the server going wrong). The application must be able to deal with this situation.
IIRC, RESTFul services should be idempotent (the same state is achieved after any number of invocations of the same service), which is a strategy for dealing with this lack of guaranteed notification of success/failure in web service architectures. The idea is that conceptually one writes state rather than invoking a service, so one can write any number of times. This means that a lack of feedback about success can be tolerated by the application as it can re-try the posting until it gets a 'success' message from the server.
Note that you can use Windows Communication Foundation (WCF) as an abstraction layer above MSMQ. This gives you the feel of working with a service - with only one-way operations.
For more information, see:
http://msdn.microsoft.com/en-us/library/ms789048.aspx
Actually there is no relation between MSMQ and WebService.
Using MSMQ for interprocess communication (you can use also sockets, windows messaging, mapped memory).
it is a windows service that responsible for keeping messages till someone dequeue them.
you can say it is more reliable than sockets as messages are stored on a harddisk but it is slower than other IPC techniques.
You can use MSMQ in dotnet with small lines of code, Just Declare your MessageQueue object and call Receive and Send methods.
The Message itself can be normal string or binary data.
As everyone has explained MSMQ is used as a queue for messages. Messages can be wrapper for actual data, object and anything that you can serialize and send across the wire. MSMQ has it's own limitations. MSMQ 1.0 and MSMQ 2.0 had a 4MB message limit. This restriction was lifted off with MSMQ 3.0. Message oriented Middleware (MOM) is a concept that heavily depends on Messaging. Enterprise Service Bus foundation is built on Messaging. All these new technologies, depend on Messaging for asynchronous data delivery with reliability.
MSMQ stands for Microsoft Messaging Queue.
It is simply a queue that stores messages formatted so that it can pass to DB (may on same machine or on Server). There are different types of queues over there which categorizes the messages among themselves.
If there is some problem/error inside message or invalid message is passed, it automatically goes to Dead queue which denotes that it is not to be processed further. But before passing a message to dead queue it will retry until a max count and till it is not processed. Then it will be sent to the Dead queue.
It is generally used for sending log message from client machine to server or DB so that if there is any issue happens on client machine then developer or support team can go through log to solve problem.
MSMQ is also a service provided by Microsoft to Get records of Log files.
You get Better Idea from this blog http://msdn.microsoft.com/en-us/library/ms711472(v=vs.85).aspx.