Many popular task queues (such as Google GAE TaskQueue, Celery) have the ETA/Countdown feature, which allows a task to be put into the queue after xxx seconds.
I am working on a project that needs a task queue with the ETA feature. However, there are some limitations that I have to use the Google Pubsub messaging system. Pubsub does not have the ETA feature. I am wondering how to implement a reliable and scalable ETA mechanism for a task queue. Both general architecture ideas and actual code samples are welcome.
Our system enqueues 600-2000 tasks/second, and about 10% of them need to have ETA. It is a distributed system and performance-critical.
I tried to trace the source code of celery, but couldn't find the actual logic of handling the ETA. It would also be good if someone can point me to the file/code of Celery that handle ETA.
I think I might have found how Celery did it. In eventlet.py, it uses eventlet's spawn_after feature to delay the worker creation "ETA" seconds.
secs = max(eta - monotonic(), 0)
g = self._spawn_after(secs, entry)
Related
I am building an application that will need to run hundreds of short running tasks every minute. These functions are not doing anything special other than making calls to an HTTP endpoint. I need a reliable mechanism for scheduling these invocations every minute indefinitely. Failures to run at the scheduled time cannot be tolerated. I have considered the following options for the scheduler:
AWS Lambda
Mesosphere Chronos
Cron
Python Celery
Obviously there is a trade off between cost, maintainability (I will need to update the logic of these functions every once in a while), and reliability.
My question is, which of these options would be the most appropriate if I am most concerned about consistency/reliability? Are there options I'm missing that I should consider?
As you already mentioned, there are multiple technologies that could help you do this, I would say that the trick is more to find the logic flow/model to use.
For example, If the number of tasks are not fixed, a publish/subscribe pattern could apply, for this something like rabbitMQ or AWS SQS could be used.
There are multiple ways about how to submit a task to the queue and also how to de-queue, you could have multiple workers reading/waiting for events in where they could read one by one or by chunks (based on the num of cores per server) all this bound to the speed and precision you may want.
Scaling I would say is easier since if need more speed (precision to do all tasks every minute) just need to add more workers.
For more ideas check this article Using AWS Lambda with Amazon DynamoDB it covers a stream-based model / event-sourcing.
I have a project that needs to be written in Perl so I've chosen ZeroMQ.
There is a single client program, generating work for a variable number of workers. The workers are real human operators who will complete a task then request a new task. The job of the client program is keep all available workers busy all day. It's a call center.
So each worker can only process one task at time, and there may be some time before requesting a new task. And the number of workers may vary during the day.
The client needs to keep a queue of tasks ready to give to workers as and when they request them. Whenever the client queue gets low the client can generate more tasks to top-up the queue.
What design pattern (i.e. what ZeroMQ Socket combination) should I use for this? I've skimmed through all the patterns in the 0MQ Guide and can't find anything that matches this.
Thanks
Sure. ... there is not a single, solo Archetype to match the Requirement List use several ZeroMQ Scalable Formal Communication Patterns
Typical software Project uses many ZeroMQ sockets ( with various Archetypes ) as a certain form of node-node signalisation and message-passing platform.
It is fair to note, that automated Load-Balancers may work fine for automated processes, but not always so for processes, executed by Humans or interacting with Humans.
Humans ( both the Call centre Agents and their Line-Supervisors ) introduce another layer of requirements - sometimes with a need to introduce non-just-Round-Robin workload distribution logic, sometimes need to switch a call from Agent A to another Agent B ( which a trivial archetype will simply not be capable of and might get into troubles, if it's hardwired-logic runs into a collision ( mutually blocked REQ-REP stale-mate being one such example ).
So simply forget to wait for one super-powered archetype, but rather create a smart network of behaviours, that will cover your distributed-computing problem desired event-handling.
There are many other aspects, one ought learn before taking the first ZeroMQ socket into service.
failure resillience
performance scaling
latency-profiling ( high-priority voice-traffic, vs. low-priority logging )
watchdog acknowledgements and timeout situations handling
cross-compatibility issues ( version 2.1x vs 3.x vs 4.+ API )
processing robustness against a malfunctioning agent / malicious attack / deadly spurious traffic storms ... to name just a few of problems
all of which has some built-ins in the ZeroMQ toolbox, some of which may need some advanced thinking, so as to handle known constraints.
The Best Next Step?
A would advocate for a fabulous Pieter HINTJENS' book "Code Connected, Volume 1" -- for everyone, who is serious into distributed processing, this is a must-read -- do not hesitate to check other my posts to find a direct URL to a PDF-version of this ZeroMQ Bible.
Worth time and one's tears and sweat.
MarkLogic Scheduled Tasks cannot be configured to run at an interval less than a minute.
Is there any way I can execute an XQuery module at an interval of 1 second?
NOTE:
Considering the situation where the Task Server is fully loaded and I need to make sure that the secondly scheduled task gets the Task Server thread whenever it needs.
Please let me know if there is anything in MarkLogic that can be used to achieve this.
Wanting rapid-fire scheduled tasks may be a hint that the design needs rethinking.
Even running a task once a minute can be risky, and needs careful thought to manage the possibilities of overlapping tasks and runaway tasks. If the application design calls for a scheduled task to run once a second, I would raise that as a potentially serious problem. Back up a few steps, and if necessary ask a new question about the higher-level problem that led to looking at scheduled tasks.
There was a sub-question about managing queue priority for tasks. Task priorities can handle some of that. There are two priorities: normal and higher. The Task Server empties the higher-priority queue first, then the normal queue. But each queue is still a simple queue, and there's no way to change priorities after a task has been spawned. So if you always queue tasks with priority=higher, then they'll all be in the higher priority queue and they'll all run in order. You can play some games with techniques like using server fields as signals to already-running tasks. But wanting to reorder tasks within a queue could be another hint that the design needs rethinking.
If, after careful thought about all the pitfalls and dangers, I decided I needed a rapid-fire task of some kind.... I would probably do it using external requests. Pick any scripting language and write a simple while loop with an HTTP request to the MarkLogic cluster. Even so, spend some time thinking about overlapping requests and locking. What happens if the request times out on the client side? Will it keep running on the server? Will that lead to overlapping requests and require deadlock resolution? Could it lead to runaway resource consumption?
Avoid any ideas that use xdmp:sleep. That will tie up a Task Server thread during the sleep period, and then you'll have two problems.
I'm planning a mechanism whose usage scenarios would be like cron's. It's a clock-ish mechanism that attempts task execution at prespecified times. Cron doesn't seem suitable, because these tasks trigger Scala method calls and the queue stored on a cloud database.
I imagine it like this: every x minutes, tasks' execution dates are retrieved from the database, and compared against current time, if the task is over-due it is executed and removed from queue.
My question is: how do I run the aforementioned check every x minutes on a distributed environment?
All advice encouraged.
I think the Akka scheduler might be what you are looking for. Here's a link to the Akka documentation and here's another link describing how to use Akka in Play.
Update: as Viktor Klang points out Akka is not a scheduler, however it does allow you to run a task periodically. I've used it in this mode quite successfully.
The best known library for this is Quartz Scheduler.
I am wondering if there is some way to delay an akka message from processing?
My use case: For every request I have, I have a small amount of work that I need to do and then I need to additional work two hours later.
Is there any easy way to delay the processing of a message in AKKA? I know I can probably setup an external distributed queue such as ActiveMQ, RabbitMQ which probably has this feature but I rather not.
I know I would need to make the mailbox durable so it can survive restarts or crashes. We already have mongo setup so I probably be using the MongoBasedMailbox for durability.
Temporal Workflow is capable of supporting your use case with minimal effort. You can think about it as a Durable Actor platform. When actor state including threads and local variables is preserved across process restarts.
Temporal offers a lot of other features for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example, it allows executing a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverable failures (SAGA)
Gives complete visibility into the current state of the update. For example, when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Temporal every event is recorded.
Ability to cancel an update in flight.
Throttling of requests
See the presentation that goes over the Temporal programming model. It talks about Cadence which is the predecessor of Temporal.
It's not ideal, but the Akka Camel Quartz scheduler would do the trick. More heavyweight than the built-in ActorSystem scheduler, but know that Quartz has its own issues.
you could still use the normal Akka scheduler, you will just have to keep a state on the actor persistence to avoid loosing the job if the server restarted.
I have recently used PersistentFsmActor - which will keep the state of the actor persisted
I'm not sure in your case you have to use FSM (Finite State Machine) , so you could basically just use a persistentActor to save the time the job was inserted, and start a scheduler to that time. this way - even if you restarted the server, the actor will start and create a new scheduled job use the persistent data to calculate the time left to run it