Distributed queue consumers in an unstable net - queue

I'm working on the design of a distributed system. The system consists of multiple producers, distributed queue and multiple consumers aka workers.
Workers instances resides within datacentres in different locations. Sometimes one location is manually disconnected.
In such a case, the issue is the worker from the disconnected location got some task from the queue and is then shutting down before task completion. I want:
workers from an alive location be able to got such a task and complete it
when a disconnected worker finally turns on, it should determine if the task was already completed by another worker and decide what to do with it
What is a convenient way to solve such an issue?

This design might help you. Every time a worker consumes a task, move the task from queue to some other distributed list of consumed tasks. In this list of tasks, maintain a timestamp with every task.
Then the worker that consumed the task should send some kind of still alive message every second or so (similar to Hadoop's hearbeat message) that updates the timestamp of a task in consumed tasks list. This is to indicate that the worker who consumed this task is still alive and received a message from him recently.
Now, implement a daemon to monitor this consumed tasks list and move the tasks back to queue whose timestamp is older than a threshold number of seconds (considering message losses).

Related

How are background workers usually implemented for polling a message queue?

Say you have a message queue that needs to be polled every x seconds. What are the usual ways to poll it and execute HTTP/Rest-based jobs? Do you simply create a cron service and call the worker script every x seconds?
Note: This is for a web application
I would write a windows service which constantly polls/waits for new messages.
Scheduling a program to run every x min has a number of problems
If your interval is too small the program will still be running with the next startup is triggered.
If your interval is too big the queue will fill up between runs.
Generally you expect a constant stream of messages, so there is no problem just keeping the program running 24/7
One common feature of the message queue systems I've worked with is that you don't poll but use a blocking read. If you have more than one waiting worker, the queue system will pick which one gets to process the message.

Queue processing one by one using RabbitMQ

I have limited number of workers and unlimited number of queues named by mask "q.*" (e.g. q.1, q.2). I need to process them
in turn. One task per one worker. When worker finished its task, it receive new one from next existing queue.
E.g. I have queues:
q.1: task11, task12, task13
q.2: task21, task22, task23
And three workers. I expect next order of executing:
worker1: task11
worker2: task21
worker3: task12
worker1: task22
worker2: task13
worker3: task23
I tried to use topic and subscribed to mask q.* but this leads to the fact that each worker receives tasks from all queues. What is correct decision?
Think of each queue as it's own bucket of work. q.1 has no relation to q.2 at all and in fact doesn't even know it exists. It may process things at different rates from q.2 and should have different consumers. A worker on q.1 should only be concerned about q.1, it shouldn't bounce back and forth between q.1 and q.2.
Are you trying to chain 2 queues together? If so you could have something like this:
Message gets put into q.1
Message is processed by a worker (call it worker1) of q.1
After worker1 acks the message it then inserts a new message into q.2
Message is processed by a worker (call it worker2) of q.2

Zookeeper priority queue

My problem description is follows:
I have n state based database infinite crawlers:
Currently how it is happening:
We are using single machine for crawling.
We have three level of priority queue. High, Medium and LOW.
At starting all Database job are put into lower level queue.
Worker reads a job from queue and do operation.
After finishing job it reschedule it with a delay of 5 minutes.
Solution I found
For Priority Queue I can use:
-
http://zookeeper.apache.org/doc/r3.2.2/recipes.html#sc_recipes_priorityQueues
Problem solution I am still searching are:
How to reschedule a job in queue with future schedule time. Is there
a way to do that in zookeeper ?
Canceling a already started job. Suppose user change his database
authentication details. I want to stop already running job for that
database and restart with new details.
What I thought is while starting a worker It will subscribe for that
it's znode changes and if something happen, It will stop that job and
reschedule it.
Infinite Queue
What I thought is that after finishing it will remove it from queue and
readd it with future schdule time. (It implementation depend on point 1)
Is it correct way of doing this task infinite task?

Least load scheduler

I'm working on a system that uses several hundreds of workers in parallel (physical devices evaluating small tasks). Some workers are faster than others so I was wondering what the easiest way to load balance tasks on them without a priori knowledge of their speed.
I was thinking about keeping track of the number of tasks a worker is currently working on with a simple counter and then sorting the list to get the worker with the lowest active task count. This way slow workers would get some tasks but not slow down the whole system. The reason I'm asking is that the current round-robin method is causing hold up with some really slow workers (100 times slower than others) that keep accumulating tasks and blocking new tasks.
It should be a simple matter of sorting the list according to the current number of active tasks, but since I would be sorting the list several times a second (average work time per task is below 25ms) I fear that this might be a major bottleneck. So is there a simple version of getting the worker with the lowest task count without having to sort over and over again.
EDIT: The tasks are pushed to the workers via an open TCP connection. Since the dependencies between the tasks are rather complex (exclusive resource usage) let's say that all tasks are assigned to start with. As soon as a task returns from the worker all tasks that are no longer blocked are queued, and a new task is pushed to the worker. The work queue will never be empty.
How about this system:
Worker reaches the end of its task queue
Worker requests more tasks from load balancer
Load balancer assigns N tasks (where N is probably more than 1, perhaps 20 - 50 if these tasks are very small).
In this system, since you are assigning new tasks when the workers are actually done, you don't have to guess at how long the remaining tasks will take.
I think that you need to provide more information about the system:
How do you get a task to a worker? Does the worker request it or does it get pushed?
How do you know if a worker is out of work, or even how much work is it doing?
How are the physical devices modeled?
What you want to do is avoid tracking anything and find a more passive way to distribute the work.

How to design task distribution with ZooKeeper

I am planning to write an application which will have distributed Worker processes. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader.
Now, my question is how to design the process of distributing the tasks evenly? Any recipe for this?
I'll elaborate a little on the environment setup:
Suppose there are 10 worker maschines, each one runs a process, one of them become leader. Tasks are submitted in the queue, the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted.
I am not sure I understand your algorithm for Leader election, but the recommended way of implementing this is to use sequential ephemeral nodes and use the algorithm at http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_leaderElection which explains how to avoid the "herd" effect.
Distribution of tasks can be done with a simple distributed queue and does not strictly need a Leader. The producer enqueues tasks and consumers keep a watch on the tasks node - a triggered watch will lead the consumer to take a task and delete the associated znode. There are certain edge conditions to consider with requeuing tasks from failed consumers. http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_recipes_Queues
I would recommend the section Example: Master-Worker Application of this book ZooKeeper Distributed Process Coordination http://shop.oreilly.com/product/0636920028901.do
The example demonstrates to distribute tasks to worker using znodes and common zookeeper commands.
Consider using an actor singleton service pattern. For example, in Scala there is Akka which solves this class of problem with less code.