Master/Slave pattern on Google Cloud using Pub/Sub - publish-subscribe

We want to build a master slave pattern on Google Cloud.
We planned to use Pub/Sub for that (similar to JMS pattern) letting each worker to grab a task from the queue and ack when done.
But, it seems like a subscriber can't get messages sent before it started.
And we're not sure how to make sure each message will be processed by a single 'slave'.
Is there a way to do it? Or another mechanism on google cloud for that?

As far as I understand the master slave pattern, the slaves do the tasks in parallel and the master harvest the result. I'd create a topic for queuing the tasks, and a single subscription attached to this topic, so that all the slaves use this subscription to fetch the task.
Also I'd create another topic/sub pair for publishing results from slaves and the master harvest the result. Alternatively the result can be stored into shared datastore like Cloud Datastore.

You can do this by creating 'single' subscription which is than used by all the slaves. pubsub service delivers new message only once to given subscription so you can be sure that given message will be processed only by 1 slave.
You can also adjust acknowledgement deadline appropriately so that delivery retry doesn't happen. If retry happens than it will result in multiple slaves getting same message.

Related

How to list/view messages on subscription in Gcloud pubsub?

I can acknowledge all messages on a subscription as follows:
gcloud pubsub subscriptions pull --auto-ack --limit=999 my-sub
(Although I often have to run this repeatedly before all messages are acknowledged).
However, I don't want to acknowledge them, I just want to see all unacknowledged messages (or just a count of how many unacknowledged messages there are would be helpful too).
I thought it might be:
gcloud pubsub subscriptions pull --limit=1 my-sub
But when I run this command it shows a different message every time.
I did this in the hope that I could run:
gcloud pubsub subscriptions pull --limit=999 my-sub
To see all unacknowledged messages.
You can use gcloud pubsub subscriptions pull to get messages for a subscription. If you do not pass in --auto-ack, then some of the messages should be displayed when you make the call. These messages will likely not be returned in subsequent requests until the ack deadline has passed since they will be considered outstanding when returned to the gcloud command. That is why you see a different message every time you call with --limit=1.
Additionally, setting the limit for the call to pull does not guarantee that all messages will be returned in a single response, only that no more than that may messages will be returned. If you want to see all messages, you'll have to run the command repeatedly, but you won't necessarily be able to see all of them in a deterministic way.
You would probably be better off writing a little subscriber app that receives messages and ultimately nacks them (or just lets the ack deadline expire if you aren't worried about ensuring the messages are delivered quickly to another subscriber).
Unfortunately, there is no direct command to get the unacknowledged messages or the number of unacknowledged messages in Pub/Sub.
However, you can use Cloud Monitoring using this method
pubsub.googleapis.com/subscription/num_undelivered_messages.
Cloud Monitoring has an API, so you can get this number programmatically.
This is how to do it:
You can get the values via the projects.timeSeries.list method. Set the name to projects/<your project> and use this filter:
metric.type = "pubsub.googleapis.com/subscription/num_undelivered_messages"
Or, if you want a specific subscription, you can add this filter:
resource.label.subscription_id = "<subscription name>".
The result will be one or more TimeSeries types with the points field, including the data points for the specified time range. It will have the value's int64 Value set to the number of unacknowledged messages by subscribers.
Also, you can see this official documentation about Introduction to Cloud Monitoring API and Monitoring your API usage.

How do I make sure that I process one message at a time at most?

I am wondering how to process one message at a time using Googles pub/sub functionality in Go. I am using the official library for this, https://pkg.go.dev/cloud.google.com/go/pubsub#section-readme. The event is being consumed by a service that runs with multiple instances, so any in memory locking mechanism will not work.
I realise that it's an anti-pattern to do this, so let me explain my use-case. Using mongoDB I store an array of objects as an embedded document for each entity. The event being published is modifying parts of this array and saves it. If I receive more than one event at a time and they start processing exactly at the same time, one of the saves will override the other. So I was thinking a solution for this is to make sure that only one message will be processed at a time, and it would be nice to use any built-in functionality in cloud pub/sub to do so. Otherwise I was thinking of implementing some locking mechanism in the DB but i'd like to avoid that.
Any help would be appreciated.
You can imagine 2 things:
You can use ordering key in PubSub. Like that, all the message in relation with the same object will be delivered in order and one by one.
You can use a PUSH subscription to PubSub, to push to Cloud Run or Cloud Functions. With Cloud Run, set the concurrency to 1 (it's by default with Cloud Functions gen1), and set the max instance to 1 also. Like that you can process only one message at a time, all the other message will be rejected (429 HTTP error code) and will be requeued to PubSub. The problem is that you can parallelize the processing as before with ordering key
A similar thing, and simpler to implement, is to use Cloud Tasks instead of PubSub. With Cloud Tasks you can set a rate limit on a queue, and set the maxConcurrentDispatches to 1 (and you haven't to do the same with Cloud Functions max instances or Cloud Run max instances and concurrency)

Architecture for ML jobs platform

I'm building a platform to run ML jobs.
Jobs will be started from an interface.
I'm making a service for each type of jobs. Some times, a service S1 might require to first make a request to another service S2 and get its output before running its own job.
Each service is split into 2 Kubernetes deployment:
one that will pull the message from a topic, check it and persist it to a database (D1)
one that will read request from the database, run the actual job, update the request state in the database and then answer to the client (D2)
Here is the flow:
interface generates a PubSub message to a topic T1
D1 pulls message from T1 and persist a request to a database
D2 sees the new request in the database and runs it then update its state in the database and answer to the client
To answer to the client, D2 has 2 options:
push a message to a pubsub topic T2 that will continiously be checked by the client. An id is passed in both request and response so that only the client can pull it from the topic.
use a callback provided by the client to make a POST request
What do you think abouut this architecture ? Does the usage of PubSub makes sense ? Also does it make sense to split each service into 2 deployment (1 that deals with request, 1 that runs the actual job ) ?
interface generates a PubSub message to a topic T1 D1 pulls message
from T1 and persist a request to a database
If there's only one database, I'm not sure I see much advantage in using a topic (implying pub/sub). Another approach would be to use a queue: the interface creates jobs into the queue, then you can have any number of workers processing it. Depending on the situation you may not even need the database at all - if all the data needed can be in the message in the queue.
use a callback provided by the client to make a POST request
That's better if you can do it, on the assumption that there's only one consumer for the event; pub/sub is more for broadcasting out to multiple consumers. Polling works but is really inefficient and has limits on how much it can scale.
Also does it make sense to split each service into 2 deployment (1
that deals with request, 1 that runs the actual job ) ?
Having separate deployables make sense if they are built by different teams and have a different release cadence or if you need to scale them out independently, otherwise it may not be necessary.

ActiveMQ Artemis. Reliable cluster with synchronous replication

I want to configure a cluster with the following expected behavior:
Сluster must be HA ( 3 nodes at least).
I have queues in which it is important to maintain processing order. The consumer always reads this queue in a single thread. If he took the message, then we consider our task completed.
I don't need load balancing - it is important for me to maintain the order of messages.
I want to avoid split-brain.
If we have 3 nodes, then if 1 of the nodes fails, the cluster should continue to work.
I tried following configurations:
master + slave + slave with replication.
It works. But does not solve the problem of split brain
master + slave + slave + Pinger
As far as I understand, this does not give a 100% guarantee of detecting network problems. We can also get split-brain.
3 pairs of live/backup nodes.
This is solved split brain problem but how can we avoid the following situation:
Producer send message to group A in queue (where important to maintain processing order)
Group A crashed ( 1/3 of all nodes 2/6)
The message stored in the journal of group A
Cluster continue to work;
Producer send message to group B in queue (where important to maintain processing order)
Consumer got this message first; We did not support the required message order.
How should I build a cluster to solve these problems?
You can't achieve the behavior you want using replication. You need to use a shared store between the nodes. If you must use 3 nodes then I would recommend master + slave + slave. Otherwise I'd recommend master + slave.
Also, for what it's worth, replication is not synchronous within the broker. It is asynchronous and non-blocking. However, it is still reliable. For example, when a broker is configured for HA with replication and it receives a durable message from a client it will persist that message to disk and send it to the replicated backup concurrently without blocking. However, it will wait for both operations to finish before responding to the client that it has received the message. This allows much greater message throughput than using a synchronous architecture internally although the whole process will appear to be synchronous to external clients.
Also, it's worth noting that work is underway to change how replication works to make it more robust against split brain and to enable a single master + slave pair that is suitable for production use.

Using Celery with multiple workers in different pods

What I'm trying to do is using Celery with Kubernetes. I'm using Redis as the message broker in a different pod and I have multiple pods for each queue of Celery.
Imagine if I have 3 queues, I would have 3 different pods (i.e workers) that can accept and handle the requests.
Everything is working fine so far but my question is, what would happen if I clone the pod of one of queues to have two pods for one single queue?
I think client (i.e Django) creates a new message using Redis to send to the worker and start the job but it's not clear to me what would happen because I have two pods listening to the same queue? Does the first pod accept the request and start the job and prevents the other pod to accept the request?
(I tried to search a bit on the documentation of Celery to see if I can find any clues but I couldn't. That's why I'm asking this question)
I guess you are using basic task type, which employs 'direct' queue type, not 'fanout' or 'topic' queue, the latter two have much difference, which will not be discussed here.
While using Redis as broker transport, celery/kombu use a Redis list object as a storage of queue (source), use command LPUSH to publish message, BRPOP to consume the message.
In short, BRPOP(doc) blocks the connection when there are no elements to pop from the given lists, if the list is not empty, an element is popped from the tail of the given list. It is guaranteed that this operation is atomic, no two connection could get the same element.
Celery leverage this feature to guarantees at-least-once message delivery. use of acknowledgment doesn't affect this guarantee.
In your case, there are multiple celery workers across multiple pods, but all of them connected to one same Redis server, all of them blocked for the same key, try to pop an element from the same list object. when new message arrived, there will be one and only one worker could get that message.
A task message is not removed from the queue until that message has been acknowledged by a worker. A worker can reserve many messages in advance and even if the worker is killed – by power failure or some other reason – the message will be redelivered to another worker.
More: http://docs.celeryproject.org/en/latest/userguide/tasks.html
The two workers (pods) will receive tasks and complete them independently. It's like have a single pod, but processing task at twice the speed.