Celery vs Ipython parallel - celery

I have looked at the documentation on both, but am not sure what's the best choice for a given application. I have looked closer at celery, so the example will be given in those terms.
My use case is similar to this question, with each worker loading a large file remotely (one file per machine), however I also need workers to contain persistent objects. So, if a worker completes a task and returns a result, then is called again, I need to use a previously created variable for the new task.
Repeating the object creation at each task call is far too wasteful. I haven't seen a celery example to lead me to believe this is possible, I was hoping to use the worker_init signal to accomplish this.
Finally, I need a central hub to keep track of what all the workers are doing. This seems to imply a client-server architecture rather than the one provided by Celery, is this correct? If so, would IPython Parallel be a good choice given the requirements?

I'm currently evaluating Celery vs IPython parallel as well. Regarding a central hub to keep track of what the workers are doing, have you checked out the Celery Flower project here? It provides a webpage that allows you to view the status of all tasks in the queue.

Related

Asynchronous Computation in scalajs Diode

I have an user interface and provide a button to the user, which executes the function longComputation(x: A): A and updates then the user interface (particularly the model) with the new result. This function may take longer to compute the result and should therefore compute in parallel.
Diode provides me with Effect, PotAction, and AsyncAction. I read the documentation about Effects and PotActions/AsyncActions, but I cannot even get a simple example to work.
Can someone point me to or provide an simple working example?
I created a ScalaFiddle based on the SimpleCounter example. There is a LongComputation button, which should run in parallel; but is not.
In JavaScript you cannot run things in parallel without using Web Workers because the JS engine is single-threaded. Web Workers are more like separate processes than threads, as they don't share memory and you need to send messages to communicate between workers and the main thread.
Have less than 50 reputation to comment, so I have to create a new answer instead of commenting #ochrons answer:
As mentioned Web Workers communicate via message passing and share no state. This concept is somehow similar to Akka - even Akka.js exists which enables you to use actor systems in ScalaJS and therefore the browser.

Scheduling/delaying of jobs/tasks in Play framework 2.x app

In a typical web application, there are some things that I would prefer to run as delayed jobs/tasks. They tend to have some or all of the following properties:
Takes a long time (anywhere from multiple seconds to multiple minutes to multiple hours).
Occupy some resource heavily (CPU, network, disk, external API limits, etc.)
Result not immediately necessary. Can complete HTTP response without it. OK (and possibly even preferable) to delay until later.
Can be (and possibly preferable to) run on (a) different machine(s) than web server(s). The machine(s) are potentially dedicated job/task runners.
Should be run in response to other event(s), or started periodically.
What would be the preferred way(s) to set up, enqueue, schedule, and run delayed jobs/tasks in a Scala + Play Framework 2.x app?
For more details...
The pattern I have used in the past, and which I would like to replicate if applicable, is:
In handler of web request, or in cron-like call, enqueue job(s)
In job runner(s), repeatedly dequeue and run one job at a time
Possibly handle recording job results
This seems to be a relatively simple yet still relatively flexible pattern.
Examples I have encountered in the past include:
Updating derived data in DB
Analytics/tracking API calls for a web request
Delete expired sessions or other stale/outdated DB records
Periodic batch ETLs
In other languages/frameworks, I would typically use a job/task framework. Examples include:
Resque in a Ruby + Rails app
Celery in a Python + Django app
I have found the following existing materials, but unfortunately, I don't think they fit my use case directly.
Play 1.x asynchronous jobs API (+ various SO questions referencing it). Appears to have been removed in 2.x line. No reference to what replaced it.
Play 2.x Akka integration. Seems very general-purpose. I'd imagine it's possible to use Akka for the above, but I'd prefer not to write a jobs/tasks framework if one already exists. Also, no info on how to separate the job runner machine(s) from your web server(s).
This SO answer. Seems potentially promising for the "short to medium duration IO bound" case, e.g. analytics calls, but not necessarily for the "CPU bound" case (probably shouldn't tie up CPU on web server, prefer to ship off to different node), the "lots of network" case, or the "multiple hour" case (probably shouldn't leave that in the background on the web server, even if it isn't eating up too many resources).
This SO question, and related questions. Similar to above, it seems to me that this covers only the cases where it would be appropriate to run on the same web server.
Some further clarification on use-cases (as per commenters' request). There are two main use-cases that I have experienced with something like resque or celery that I am trying to replicate here:
Some event on the site (Most often, an incoming web request causes task to be enqueued.)
Task should run periodically. (Most often, this is implemented as: periodically, enqueue task to be run as above.)
In the case of resque or celery, the tasks enqueued by both use-cases enter queues the same way and are treated the same way by the runner/worker process. Barring other Scala or Play-specific considerations, that would be my initial guess for how to approach this.
Some further clarification on why I do not believe the Akka scheduler fits my use case out-of-the-box (as per commenters' request):
While it is no doubt possible to construct a fitting solution using some combination of the Akka scheduler (for periodic jobs), akka-remote and akka-cluster (for communicating between the job caller and the job runner), that approach requires a certain amount of glue code which is almost a delayed job framework in and of itself. If it exists, I would prefer to use an existing out-of-the-box solution rather than reinvent the wheel.

Scheduling java method wit persistance

I need to execute a call to a particular method daily or more, considering that the app may and the machine may reboot.
I saw examples where they just put the thread to sleep but I need persistance, managing system rebooting.
I have to be sure that if I switch off my machine when I reboot it reprises task execution.
I found schedulers as cron4j and quartz but don't get if it's possible, and if it is, how to do that.
With Quartz you will only need to configure it with a persistent job store implementation and that is pretty much all there is to it. I suggest that you read through the Quartz scheduler tutorial, especially the chapter that describes Quartz job stores.

Batch processing with celery?

I am using celery to process and deploy some data and I would like to be able to send this data in batches.
I found this:
http://docs.celeryproject.org/en/latest/reference/celery.contrib.batches.html
But I am having problems with it, such as:
It does not play nicely with eventlets, putting exceptions in the log
files stating that the timer is null after the queue is empty.
It seems to leave additional hanging threads after calling celery
multi stop
It does not appear to adhere to the standard logging of a typical
Task.
It does not appear to retry the task when raise mytask.retry() is
called.
I am wondering if others have experienced these problems, and is there a solution?
I am fine with implementing batch processing on my own, but I do not know a good strategy to make sure that all items are deployed (i.e. even those at the end of the thread).
Also, if the batch fails, I would want to retry the entire batch. I am not sure of any elegant way to do that.
Basically, I am looking for any viable solution for doing real batch processing with celery.
I am using Celery v3.0.21
Thanks!

Queue suggestions for deferred execution for a one-off task

I'm looking for a lightweight system that will let me queue up a one-off (non-recurring) task and have it execute at a specific time in the future.
This is for the backend of a game where the user does tasks that are time-based. I need the server to check the status of the user's "job" at the completion time and perform the necessary housekeeping on their game state.
I'm somewhat familiar with Redis, Celery, Beanstalkd, ZeroMQ, et al., but I haven't found any info on scheduling a single unit of work to be executed in the future. (or pop off the queue at a set time) Celerybeat has a scheduler for cron-type recurring tasks, but I didn't see anything for one-off.
I've also seen the "at" command in *nix, but I'm not aware of any frontend for it that can help me manage the jobs.
I realize there are some easy solutions such as ordering keys in Redis and doing a blocking pop, but I'd like to not have to continuously poll a queue to see if the next job is ready.
The closest I've found is the deferred library on GAE, but I was hoping for something that runs on my own Linux box along with my other components.
I'd appreciate any suggestions!
Celery allows you to specify a countdown or an ETA at the call of a task to be executed.
The documentation says it best:
http://docs.celeryproject.org/en/latest/userguide/calling.html#eta-and-countdown