Queue suggestions for deferred execution for a one-off task - queue

I'm looking for a lightweight system that will let me queue up a one-off (non-recurring) task and have it execute at a specific time in the future.
This is for the backend of a game where the user does tasks that are time-based. I need the server to check the status of the user's "job" at the completion time and perform the necessary housekeeping on their game state.
I'm somewhat familiar with Redis, Celery, Beanstalkd, ZeroMQ, et al., but I haven't found any info on scheduling a single unit of work to be executed in the future. (or pop off the queue at a set time) Celerybeat has a scheduler for cron-type recurring tasks, but I didn't see anything for one-off.
I've also seen the "at" command in *nix, but I'm not aware of any frontend for it that can help me manage the jobs.
I realize there are some easy solutions such as ordering keys in Redis and doing a blocking pop, but I'd like to not have to continuously poll a queue to see if the next job is ready.
The closest I've found is the deferred library on GAE, but I was hoping for something that runs on my own Linux box along with my other components.
I'd appreciate any suggestions!

Celery allows you to specify a countdown or an ETA at the call of a task to be executed.
The documentation says it best:
http://docs.celeryproject.org/en/latest/userguide/calling.html#eta-and-countdown

Related

non-stop workers in celery

I'm creating a distributed web crawler that crawls multiple social media at the same time. This system is designed to distribute the available resources to the different social media based on their current post rate.
For example, if social media 1 has 10 new posts per hour and social media 2 has 5 posts per hour, 2 crawlers focus on social media 1 and 1 crawler focus on social media 2 (if we are allowed to have just three crawlers).
I have decided to implement this project via Celery, Flask, rabbitMQ, and Kubernetes as the resource manager.
I have some questions regarding the deployment:
How can I tell celery to keep a fixed number of tasks in rabbitMQ? This crawler should never stop crawling and should create a new task based on the social media's post rates (which is gathered from the previous crawling data), but the problem is, I don't have a task submitter for this process. Usually, there is a task submitter for celery that submits tasks, but there is no such thing as a task submitter in this project. We have a list of social media and the number of workers they need (stored in Postgres) and need celery to put a task in rabbitMQ as soon as a task is finished.
I have tried the solution to submit a task at the end of every job (Crawling Process), but this approach has a problem and is not scalable. In this case, the submitted job would be the last in the rabbitMQ queue.
I need a system to manage the free workers and assign tasks to them immediately. The system I want should check the free and busy workers and database post rates and give a task to the worker. I think using rabbitMQ (or even Redis) might not be good because they are message brokers which assign a worker to a task in the queue but here, I don't want to have a queue; I want to start a task immediately when a free worker is found. The main reason queueing is not good is that the task should be decided when the job is starting, not before that.
My insights on your problem.
I need a system to manage the free workers and assign tasks to them
immediately.
-- Celery does this job for you
The system I want should check the free and busy workers and database
post rates and give a task to the worker.
Celery is a task distribution system, it will distribute the tasks as you expect
I think using rabbitMQ (or even Redis) might not be good because they
are message brokers which assign a worker to a task in the queue
Using celery, you definitely need a broker, they just hold your messages, celery will poll the queues and distribute them to the right workers(priority, timeout, soft handling, retries)
but here, I don't want to have a queue; I want to start a task
immediately when a free worker is found. The main reason queueing is
not good is that the task should be decided when the job is starting,
not before that.
This is kind of a chain reaction or like triggering a new job once the previous one is done. If this is the case, you don't even need celery or a distributed producer-consumer system.
Identify the problem:
Do you need a periodic task to be executed at a point in time? ---> go with a cronjob or celery-beat(cron job-based celery scheduler)
Do you require multiple tasks to be executed without blocking the other running tasks - You need a producer-consumer system(Celery(out of the box solution, Rabbitmq/Redis Native Python Consumers))
3.If the same task should be triggering the new task, there is no need to have multiple workers, what will we achieve from having multiple workers if your work is just a single thread.
Outcome -- [Celery, RabbitMQ, and Kubernetes - Good combo for a distributed orchestrated system] or [a webhook model] or [recursive python script]
Reply to your below comment #alavi
One way of doing it can be like, write a periodic job(can run every
second/minute or an hour or whatever rate) using celery beat, which
will act as a producer or parent task. It can iterate all media sites
from DB and spawn a new task for crawling. The same work status can be
maintained in DB, based on the status, new tasks can be spawn. For a
start I can say like this parenting task will check if the previous
job is still running, or check the progress of the last task, based on
the progress decide upon, even we can think about splitting the crawl
job again into micro tasks and being triggered from the parent job.
You can collect some more x and y going further during development or
with performance.

Is there a way to process the queued webhook in ADO?

We have a service hook created for one of our projects in ADO. It was going fine until last weekend. Suddenly few webhooks started queued and I am not sure how to force it to get processed. Can someone help me if there is a way to force those items to get processed.
Thanks,
Venu
I am afraid that you cannot get that you want during process.
Under the process, the queued service hooks will not be picked again and will not be processed again.
When the main thread, such as a work item, is running, you cannot forcefully intervene or exit the content that is already queued.
And there is a similar issue also discussing about this situation.
And waiting service hooks are actually coupled, which also depends on your memory, because they actually run in memory. If there are occasional memory loss and other problems during execution, this cannot ensure that all service hooks can be executed as expected.
Or you should interrupt the current process and reduce the service hooks for it. But it is not a good solution.
So it is the best way to add a function that can handle the queued service hooks in the process. But currently there is no such function. Therefore we recommend you submit the suggestion ticket to the Team to suggest them add that feature.

Architecting a configurable user notification service

I am building an application which needs to send notifications to users at a fixed time of day. Users can choose which time of day they would like to be notified, and which days they would like to be notified. For example, a user might like to be notified at 6am every day, or 7am only on week days.
On the back-end, I am unsure how to architect the service that sends these notifications. The solution needs to handle:
concurrency, so I can scale my servers (notifications should not be duplicated)
system restarts
if a user changes their preferences, pending notifications should be rescheduled
Using a message broker such as RabbitMQ and task scheduler such as Celery may meet your requirements.
Asynchronous, or non-blocking, processing is a method of separating the execution of certain tasks from the main flow of a program. This provides you with several advantages, including allowing your user-facing code to run without interruption.
Message passing is a method which program components can use to communicate and exchange information. It can be implemented synchronously or asynchronously and can allow discrete processes to communicate without problems. Message passing is often implemented as an alternative to traditional databases for this type of usage because message queues often implement additional features, provide increased performance, and can reside completely in-memory.
Celery is a task queue that is built on an asynchronous message passing system. It can be used as a bucket where programming tasks can be dumped. The program that passed the task can continue to execute and function responsively, and then later on, it can poll celery to see if the computation is complete and retrieve the data.
While celery is written in Python, its protocol can be implemented in any language. worker is an implementation of Celery in Python. If the language has an AMQP client, there shouldn’t be much work to create a worker in your language. A Celery worker is just a program connecting to the broker to process messages.
Also, there’s another way to be language independent, and that’s to use REST tasks, instead of your tasks being functions, they’re URLs. With this information you can even create simple web servers that enable preloading of code. Simply expose an endpoint that performs an operation, and create a task that just performs an HTTP request to that endpoint.
Here it is the python example from official documentation:
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
# Calls test('world') every 30 seconds
sender.add_periodic_task(30.0, test.s('world'), expires=10)
# Executes every Monday morning at 7:30 a.m.
sender.add_periodic_task(
crontab(hour=7, minute=30, day_of_week=1),
test.s('Happy Mondays!'),
)
#app.task
def test(arg):
print(arg)
As I can see you need to have 3 types of entities: users (to store email or some other way to reach the user), notifications (to store what you want to send to user - text etc) and schedules (to store when user want to get notifications). You need to store entities of those types in some kind of database.
Schedule should be connected to user, notification should be connected to user and schedule.
Assume you have cron job that starts some script every minute. This script will try to get all notifications connected with schedule for current time (job starting time). Don't forget to implement some type of overlaping prevention.
After this script will place a tasks (with all needed data: type of notification, users who you want to notify etc) in queue (beanstalkd or something). You can create as many workers (even on different physical instances) as you want to serve this queue (without thinking about duplication) - this will give you a great power of scalability.
In case user changed his schedule it will affect all his notification at the same moment. There is no pending notification as they will be served only when they really should be send.
This is a very highlevel description. Many things depends on language, database(s), queue server, wokers implementation.

Intercepting and stopping a celery beat task before publishing to message bus

I am using signals to intercept celery beat tasks before publishing. This works fine. But, in addition I want to execute some logic and, based on the result, possibly cancel the task.
I cannot find a way to cancel the task from the event handler, aside from raising an exception and that seems very inelegant.
The background is that I am implementing distributed task processing using cache locks and I am performing CAS operations on the lock before publishing.
Is there any way to implement this using current celery/celerybeat functionality?
Thanks

Celery vs Ipython parallel

I have looked at the documentation on both, but am not sure what's the best choice for a given application. I have looked closer at celery, so the example will be given in those terms.
My use case is similar to this question, with each worker loading a large file remotely (one file per machine), however I also need workers to contain persistent objects. So, if a worker completes a task and returns a result, then is called again, I need to use a previously created variable for the new task.
Repeating the object creation at each task call is far too wasteful. I haven't seen a celery example to lead me to believe this is possible, I was hoping to use the worker_init signal to accomplish this.
Finally, I need a central hub to keep track of what all the workers are doing. This seems to imply a client-server architecture rather than the one provided by Celery, is this correct? If so, would IPython Parallel be a good choice given the requirements?
I'm currently evaluating Celery vs IPython parallel as well. Regarding a central hub to keep track of what the workers are doing, have you checked out the Celery Flower project here? It provides a webpage that allows you to view the status of all tasks in the queue.