What's the point of having a single celery worker with multiple queues? - celery

continuing How does a Celery worker consuming from multiple queues decide which to consume from first?
I've setup a single worker and have it listen to two queues. I understand from the above linked question that the worker would consume messages from those two queues in round-robin or in the order they arrived (depending on celery version).
So what's the purpose of this setting? Why is it different than a single queue? Would that be helpful only for monitoring, or is there an operational benefit i'm missing here?

In most scenarios you will have your worker subscribed only to a single queue, however there are scenarios when having ability to subscribe to multiple queues makes sense.
Here is one. Imagine you have a Celery cluster of 10 machines. They perform various tasks, and among them there is a task that downloads files from remote file-server. However, the owner of the file-server whitelisted only two of your 10 machine IPs, so basically only two of them can download files from that particular file-server. Typically you will have Celery workers on these two machines subcribe to an additional queue, called "download" for an example, and schedule download tasks by sending them to the "download" queue.
This is a very common scenario where a subset of your nodes can do particular thing (access remote servers - file servers, database servers, etc).
One could argue "why not have just the 'download' queue on these two machines?" - that may be a waste of resources.

Related

High Scalability Question: How to sync data across multiple microservices

I have the following use cases:
Assume you have two micro-services one AccountManagement and ActivityReporting that processes event U.
When a user registers, event U containing the user information will published into a broker for the two micro-services to process.
AccountManagement, and ActivityReporting microservice are replicated across two instances each for performance and scalability reasons.
Each microservice instance has a consumer listening on the broker topic. The choice of topic is so that both AccountManagement, and ActivityReporting can process U concurrently.
However, I want only one instance of AccountManagement to process event U, and one instance of ActivityReporting to process event U.
Please share your experience implementing a Consume Once per Application Group, broker system.
As this would effectively solve this problem.
If all your consumer listeners even from different instances have the same group.id property then only one of them will receive the message. You need to set this property when you initialise the consumer. So in your case you will need one group.id for AccountManagement and another for ActivityReporting.
I would recommend Cadence Workflow which is much more powerful solution for microservice orchestration.
It offers a lot of advantages over using queues for your use case.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

Using Celery with multiple workers in different pods

What I'm trying to do is using Celery with Kubernetes. I'm using Redis as the message broker in a different pod and I have multiple pods for each queue of Celery.
Imagine if I have 3 queues, I would have 3 different pods (i.e workers) that can accept and handle the requests.
Everything is working fine so far but my question is, what would happen if I clone the pod of one of queues to have two pods for one single queue?
I think client (i.e Django) creates a new message using Redis to send to the worker and start the job but it's not clear to me what would happen because I have two pods listening to the same queue? Does the first pod accept the request and start the job and prevents the other pod to accept the request?
(I tried to search a bit on the documentation of Celery to see if I can find any clues but I couldn't. That's why I'm asking this question)
I guess you are using basic task type, which employs 'direct' queue type, not 'fanout' or 'topic' queue, the latter two have much difference, which will not be discussed here.
While using Redis as broker transport, celery/kombu use a Redis list object as a storage of queue (source), use command LPUSH to publish message, BRPOP to consume the message.
In short, BRPOP(doc) blocks the connection when there are no elements to pop from the given lists, if the list is not empty, an element is popped from the tail of the given list. It is guaranteed that this operation is atomic, no two connection could get the same element.
Celery leverage this feature to guarantees at-least-once message delivery. use of acknowledgment doesn't affect this guarantee.
In your case, there are multiple celery workers across multiple pods, but all of them connected to one same Redis server, all of them blocked for the same key, try to pop an element from the same list object. when new message arrived, there will be one and only one worker could get that message.
A task message is not removed from the queue until that message has been acknowledged by a worker. A worker can reserve many messages in advance and even if the worker is killed – by power failure or some other reason – the message will be redelivered to another worker.
More: http://docs.celeryproject.org/en/latest/userguide/tasks.html
The two workers (pods) will receive tasks and complete them independently. It's like have a single pod, but processing task at twice the speed.

Distributed queue consumers in an unstable net

I'm working on the design of a distributed system. The system consists of multiple producers, distributed queue and multiple consumers aka workers.
Workers instances resides within datacentres in different locations. Sometimes one location is manually disconnected.
In such a case, the issue is the worker from the disconnected location got some task from the queue and is then shutting down before task completion. I want:
workers from an alive location be able to got such a task and complete it
when a disconnected worker finally turns on, it should determine if the task was already completed by another worker and decide what to do with it
What is a convenient way to solve such an issue?
This design might help you. Every time a worker consumes a task, move the task from queue to some other distributed list of consumed tasks. In this list of tasks, maintain a timestamp with every task.
Then the worker that consumed the task should send some kind of still alive message every second or so (similar to Hadoop's hearbeat message) that updates the timestamp of a task in consumed tasks list. This is to indicate that the worker who consumed this task is still alive and received a message from him recently.
Now, implement a daemon to monitor this consumed tasks list and move the tasks back to queue whose timestamp is older than a threshold number of seconds (considering message losses).

MSMQ multiple readers

This is my proposed architecture. Process A would create items and add it to a queue A on the local machine and I plan to have multiple instances of windows service( running on different machines) reading from this queue A .Each of these windows service would read a set of messages and then process that batch.
What I want to make sure is that a particular message will not get processed multiple times ( by the different windows service). Does MSMQ by default guarantee single delivery?
Should I make the queue transactional? or would a regular queue suffice.
If you need to make sure that the message is delivered only once, you would want to use a transactional queue. However, when a service reads a message from the queue it is removed from the queue and can only be received once.

How to design task distribution with ZooKeeper

I am planning to write an application which will have distributed Worker processes. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader.
Now, my question is how to design the process of distributing the tasks evenly? Any recipe for this?
I'll elaborate a little on the environment setup:
Suppose there are 10 worker maschines, each one runs a process, one of them become leader. Tasks are submitted in the queue, the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted.
I am not sure I understand your algorithm for Leader election, but the recommended way of implementing this is to use sequential ephemeral nodes and use the algorithm at http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_leaderElection which explains how to avoid the "herd" effect.
Distribution of tasks can be done with a simple distributed queue and does not strictly need a Leader. The producer enqueues tasks and consumers keep a watch on the tasks node - a triggered watch will lead the consumer to take a task and delete the associated znode. There are certain edge conditions to consider with requeuing tasks from failed consumers. http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_recipes_Queues
I would recommend the section Example: Master-Worker Application of this book ZooKeeper Distributed Process Coordination http://shop.oreilly.com/product/0636920028901.do
The example demonstrates to distribute tasks to worker using znodes and common zookeeper commands.
Consider using an actor singleton service pattern. For example, in Scala there is Akka which solves this class of problem with less code.