Is it possible for a single message to be given to multiply instance of the same subscription in gcloud PubSub - publish-subscribe

I have a publisher that publishes messages to a particular topic (myTopic), then on my PubSub I create a subscription name: myTopicSub to this topic (myTopic), then I have a VM that runs a service that listeners on my subscription myTopicSub
THIS WORKS
MY PROBLEM IS: if there be a need to scale, and I add 5 more VM to handle more messages from my subscription... is it possible for PubSub to send the same message to more than one VM...
Because I only need one VM to process the message once. Please I need help

Cloud Pub/Sub offers at-least-once delivery. That means that a message can be delivered multiple times and in some cases, can be delivered to two different subscribers for the same subscription within a short period of time. That particular type of duplicate delivery is rare, but not impossible.
Subscribers have to be able to handle the delivery of duplicates and depending on their nature may handle it in different ways. For some, all actions are idempotent, so re-processing the same message has no ill effects. In other cases, the subscribers need to track which messages they have received and processed and if a message is a duplicate, just immediately ack the message instead of process it.

Related

Google PubSub with pull subscriber design flaw?

We are using googles steaming pull subscriber the design is as follows
We are doing
sending file from FE (frontend) to BE (backend)
BE converting that file to ByteArray and publishing to pubsub topic as message (so ByteArray going as message)
Topic sending that message to subscriber, subscriber converting the ByteArray to file again
that converted file subscriber sending to that tool
tool doing some cool stuff with file and notify the status to subscriber
that status going to BE and BE update the DB and sending that status to FE
Now in our subscriber when we receive message we are immediately acknowledge it and removing the listener of subscriber so that we don't get message any more
and when that tool done that stuff, it sending status to subscriber (we have express server running on subscriber) and
after receiving status we are re-creating listener of subscriber to receive message
Note
that tool may take 1hr or more to do stuff
we are using ordering key to properly distribute message to VM's
this code is working fine but my question is
is there any flaw in this (bcz we r removing listener then again re creating it or anything like that)
or any better option or GCP services to best fit this design
or any improvement in code
EDIT :
Removed code sample
I would say that there are several parts of this design that are sub-optimal. First of all, acking a message before you have finished processing it means you risk message loss. What happens if your tool or subscriber crashes after acking the message, but before processing has completed? This means when the processes start back up, they will not receive the message again. Are you okay with requests from the frontend possibly never being processed? If not, you'll want to ack after processing is completed, or--given that your processing takes so long--persist the request to a database or to some storage and then acknowledge the message. If you are going to have to persist the file somewhere else anyway, you might want to consider taking Pub/Sub out of the picture and just writing the file to storage like GCS and then having your subscribers instead read out of GCS directly.
Secondly, stopping the subscriber upon each message being received is an anti-pattern. Your subscriber should be receiving and processing each message as it arrives. If you need to limit the number of messages being processed in parallel, use message flow control.
Also ordering keys isn't really a way to "properly distribute message to VM's." Ordering keys is only a means by which to ensure ordered delivery. There are no guarantees that the messages for the same ordering key will continually go to the same subscriber client. In fact, if you shut down the subscriber client after receiving each message, then another subscriber could receiving the next message for the ordering key since you've acked the earlier message. If all you mean by "properly distribute message" is that you want the messages delivered in order, then this is the correct way to use ordering keys.
You say you have a subscription per client, then whether or not that is the right thing to do depends on what you mean by "client." If client means "user of the front end," then I imagine you plan to have a different topic per user as well. If so, then you need to keep in mind the 10,000 topic-per-project limit. If you mean that each VM has its own subscription, then note that each VM is going to receive every message published to the topic. If you only want one VM to receive each message, then you need to use the same subscription across all VMs.
In general, also keep in mind that Cloud Pub/Sub has at-least-once delivery semantics. That means that even an acknowledged message could be redelivered, so you do need to be prepared to handle duplicate message delivery.

Reference counted Pub/Sub system

I am searching for a way to design my system that consists of multiple publishers, multiple channels and multiple subscribers, all of which can be uniquely identified easily.
I need to send messages in both directions, with as low as possible latency. However, if a subscriber dies, the messages he subscribed to should not be dropped, when it comes back online, it should receive all pending messages. Since I handle with very high numbers of messages (up to 1000 per second happens on a regular basis) while having a low-spec server, meaning keeping lists of all messages at all times is not an option.
I was considering if a reference count/list for messages is a viable option. When a message is published, it is initialized with a list of subscribers to that specific channel, when a subscriber receives the message, the subscriber is removed from the list. The message is removed if the list is empty.
Now, if a subscriber dies without unsubscribing, the messages will not be removed because the list of missing subscribers is not empty. When it comes back online, it will be able to receive the list of all pending messages, since it identifies with the same ID as the dead instance.
Perhaps it would be required to have messages/subscribers time out, for example if a subscriber has been inactive for 10 minutes, all list entries containing it are cleared.
Is that a good idea, have I forgotten problems that could arise with this system in particular? Is there any system that already does this? RabbitMQ and similar PubSub systems dont seem to have this - if not, I guess redis is the way to go?
I can imagine managing reference count for the purposes of message lifecycles. This sounds reasonable in terms of message and memory management during normal Service operation. Of course, timeouts provide patch for references from dead services.
However in terms of health monitoring and service recovery issues this is quite another story.
The danger that I currently see here is state management. Imagine a service that is a stateful subscriber (i.e. has a State Machine) that is driven from it initial state (I) to a certain state (S). Each message is being processed differently in different states. Now imagine that your service dies and gets restarted. Meanwhile some messages are stored and after the service is back online, they are dispatched to it. However the Service receives them in the wrong state (I instead of S) and acts unexpectedly.
Can you restore the service in the exact state it was when crashed? In practice, this is extremely difficult since even in the State Machine approach the service has side effects / communicates with global state(s) etc.
Bottomline, reference counting seems reasonable in terms of managing Messages, but mixing it with health monitoring results in lots of complexity issues.

Is RabbitMQ capable of "pushing" messages from a queue to a consumer?

With RabbitMQ, is there a way to "push" messages from a queue TO a consumer as opposed to having a consumer "poll and pull" messages FROM a queue?
This has been the cause of some debate on a current project i'm on. The argument from one side is that having consumers (i.e. a windows service) "poll" a queue until a new message arrives is somewhat inefficient and less desirable than the idea having the message "pushed" automatically from the queue to the subscriber(s)/consumer(s).
I can only seem to find information supporting the idea of consumers "polling and pulling" messages off of a queue (e.g. using a windows service to poll the queue for new messages). There isn't much information on the idea of "pushing" messages to a consumer/subscriber...
Having the server push messages to the client is one of the two ways to get messages to the client, and the preferred way for most applications. This is known as consuming messages via a subscription.
The client is connected. (The AMQP/RabbitMQ/most messaging systems model is that the client is always connected - except for network interruptions, of course.)
You use the client API to arrange that your channel consume messages by supplying a callback method. Then whenever a message is available the server sends it to the client over the channel and the client application gets it via an asynchronous callback (typically one thread per channel). You can set the "prefetch count" on the channel which controls the amount of pipelining your client can do over that channel. (For further parallelism an application can have multiple channels running over one connection, which is a common design that serves various purposes.)
The alternative is for the client to poll for messages one at a time, over the channel, via a get method.
You "push" messages from Producer to Exchange.
https://www.rabbitmq.com/tutorials/tutorial-three-python.html
BTW this is fitting very well IoT scenarios. Devices produce messages and sends them to an exchange. Queue is handling persistence, FIFO and other features, as well as delivery of messages to subscribers.
And, by the way, you never "Poll" the queue. Instead, you always subscribe to publisher. Similar to observer pattern. Generally, I would say genius principle.
So it is similar to post box or post office, except it sends you a notification when message is available.
Quoting from the docs here:
AMQP brokers either deliver messages to consumers subscribed to
queues, or consumers fetch/pull messages from queues on demand.
And from here:
Storing messages in queues is useless unless applications can consume
them. In the AMQP 0-9-1 Model, there are two ways for applications to
do this:
Have messages delivered to them ("push API")
Fetch messages as needed ("pull API")
With the "push API", applications have to indicate interest in
consuming messages from a particular queue. When they do so, we say
that they register a consumer or, simply put, subscribe to a queue. It
is possible to have more than one consumer per queue or to register an
exclusive consumer (excludes all other consumers from the queue while
it is consuming).
Each consumer (subscription) has an identifier called a consumer tag.
It can be used to unsubscribe from messages. Consumer tags are just
strings.
RabbitMQ broker is like server that wont send data to consumer without consumer client getting registering itself to server. but then question comes like below
Can RabbitMQ keep client consumer details and connect to client when packet comes?
Answer is no. so what is alternative well then write plugin by yourself that maintain client information in some kind of config. Plugin will pull from RabbitMQ Queue and push to client.
Please give look at this plugin might help.
https://www.rabbitmq.com/shovel.html
Frankly speaking Client need to implement AMQP protocol to receive so and should listen connection on some port for that. This sound like another server.
Regards,
Vishal

Routing MSMQ messages from one queue to another

Is there some standard configuration setting, service, or tool that accepts messages from one queue and moves them on to another one? Automatically handling the dead message problem, and providing some of retry capability? I was thinking this is what "MSMQ Message Routing" does but can't seem to find documentation on it (except for on Windows Mobile 6, and I don't know if that's relevant).
Context:
I understand that when using MSMQ you should always write to a local queue so that failure is unlikely, and then X should move that message to a remote queue. Is my understanding wrong? Is this where messaging infrastructure like Biztalk comes in? Is it unnecessary to write to a local queue first to absolutely ensure success? Am I supposed to build X myself?
As Hugh points out, you need only one MSMQ Queue to Send messages in one direction from a source to a destination. Source and destination can be on the same server, same network or across the internet, however, both source and destination must have the MSMQ service running.
If you need to do 'message' routing (e.g. a switch which processes messages from several source or destination queues, or routing a message to one or more subscribers based on the type of message etc) you would need more than just MSMQ queue.
Although you certainly can use BizTalk to do message routing, this would be expensive / overkill if you didn't need to use other features of BizTalk. Would recommend you look at open source, or building something custom yourself.
But by "Routing" you might be referring to the queue redirection capability when using HTTP as the transport e.g. over the internet (e.g. here and here).
Re : Failed delivery and retry
I think you have most of the concepts - generally the message DELIVERY retry functionality should be implicit in MSMQ. If MSMQ cannot deliver the message before the defined expiry, then it will be returned on the Dead Letter Queue, and the source can then process messages from the DLQ and then 'compensate' for them (e.g. reverse the actions of the 'send', indicate failure to the user, etc).
However 'processing' type Retries in the destination will need to be performed by the destination application / listener (e.g. if the destination system is down, deadlocks, etc)
Common ways to do this include:
Using 2 Phase commit - under a distributed unit of work, pull the message off MSMQ and process it (e.g. insert data into a database, change the status of some records etc), and if any failure is encountered, then leave the message back onto the queue and the DB changes will be rolled back.
Application level retries - i.e. on the destination system, in the event of 'retryable' type errors (timeout due to load, deadlocks etc) then to sleep for a few seconds and then retry the same transaction.
However, in most cases, indefinite processing retries are not desirable and you would ultimately need to admit defeat and implement a mechanism to log the message and the error and remove it from the queue.
But I wouldn't 'retry' business failures (e.g. Business Rules, Validation etc) and the behaviour should be defined in your requirements of how to handle these (e.g. account is overdrawn, message is not in a correct format or not valid, etc), e.g. by returning a "NACK" type message back to the source.
HTH
MSMQ sends messages from one queue to another queue.
Let's say you have a queue on a remote machine. You want to send a message to that queue.
So you create a sender. A sender is an application that can use the MSMQ transport to send a message. This can be a .Net queue client (System.Messaging), a WCF service consumer (either over netMsmqBinding or msmqIntegrationBinding, BizTalk using the MSMQ adapter, etc etc.
When you send the message, what actually happens is:
The MSMQ queue manager on the sender machine writes the message to a temporary local queue.
The MSMQ queue manager on the sender machine connects to the MSMQ manager on the receiving machine and transmits the message.
The MSMQ queue manager on the receivers machine puts the message onto the destination queue.
In certain situations MSMQ will encounter messages which for some reason or another cannot be received on the destination queue. In these situations, if you have indicated that a message will use the dead-letter queue then MSMQ will make sure that the message is forwarded to the dead-letter queue.

What all functionality are there in queue which can't be achieved by topic?

What all functionality are there in queue which can't be achieved by topic??
The main requirement that I run into is that consumers cannot compete for a single message on a topic. For example, I have a client who publishes call center events. Several systems subscribe to these events. One of these systems is the actual call routing application which has multiple instances running. If each instance subscribes then the call is routed to all of them. However, if the message is dropped onto a queue and all the instances consume off the same queue then only one will receive the message and the call goes to that operator. If the publishing application converts from topics to a queue, the call center works but all the other subscriber apps don't get the message.
The solution (as implemented in WebSphere MQ) was to create an administrative subscription on the topic and deliver the messages to a queue that all application instances consume from. So the producer apps are still publishers, all the dynamic subscribers still get copies of the message and the call center app instances compete for a single instance of each published message.
Also, you can't use browse semantics on a topic whereas you can on a queue. With topics you can specify selectors to filter the messages that are returned but that's about it. With queues you can browse, reset the browse pointer and then browse some more.
If you put a message on a queue and nothing is there to receive it, the message remains queued up. If you put a message to a topic and there are no active subscribers or durable subscriptions, the message is discarded. Therefore messages in a queue are naturally durable whereas messages on a topic may or may not be.
From a pure JMS perspective, queue and topic are both instances of destination and are interchangeable if you don't try to browse. An application may not know whether the destination it opens is a queue or a topic unless it uses instanceOf() at run-time to find out.