I am using celery to implement async task, but I have tons of async task, so I have to call task_func.delay() multi times, my code is as following:
the registerd task:
#app.task()
def task1
...
#app.task()
def task2
...
#app.task()
def task3
...
#app.task()
def task4
...
#app.task()
def task5
...
#app.task()
def task6
...
.....
.....
I call the celery task in my other code such as in django views:
task1.delay()
task2.delay()
task3.delay()
task4.delay()
task5.delay()
task6.delay()
......
......
with the above code, each time I create a new celery task func, I have to call it with delay(), is there any way I can call all the task together?
Sure it is. Canvas: Designing Workflows section in the Celery documentation explains how. In your particular case, if I understood it well, you need to use the Chain primitive.
Related
Using:
celery==5.2.7
django-celery-results==2.4.0
django==4.1
pytest==7.1.2
pytest-django==4.5.2
pytest-celery==0.0.0
I'm trying to test a task (start_task) that creates a chord (of N work_task tasks) with a callback task to summarize the work.
def test_function(db):
...
obj = make_obj
...
start_task.delay(obj)
I call start_task which creates a single work_task. The chord never
completes so that the summarize_task gets called. The work_task completes successfully (I can see that in the debugger). When I modify the test to:
def test_function(db, celery_app, celery_worker):
...
obj = make_obj
...
start_task.delay(obj)
The test dies on make_obj because the db connection is already closed.
E psycopg2.InterfaceError: connection already closed
My work around for the moment is to manually call tasks so that celery is not involved, but this does not test the chord mechanisms, only the logic that is invoked by the chord.
If someone has an example
It can be done using UnitTest style tests with pytest. I haven't solved this using native pytest yet. The secret sauce below is to use a TransactionTestCase vs a normal Django TestCase
from django.test import TransactionTestCase, override_settings
#pytest.mark.xdist_group(name="celery")
#override_settings(CELERY_TASK_ALWAYS_EAGER=False)
#override_settings(CELERY_TASK_EAGER_PROPAGATES=False)
class SyncTaskTestCase2(TransactionTestCase):
#classmethod
def setUpClass(cls):
super().setUpClass()
cls.celery_worker = start_worker(app, perform_ping_check=False)
cls.celery_worker.__enter__()
print(f"Celery Worker started {time.time()}")
#classmethod
def tearDownClass(cls):
print(f"Tearing down Superclass {time.time()}")
super().tearDownClass()
print(f"Tore down Superclass {time.time()}")
cls.celery_worker.__exit__(None, None, None)
print(f"Celery Worker torn down {time.time()}")
def test_success(self):
print(f"Starting test at {time.time()}")
self.task = do_average_in_chord.delay()
self.task.get()
print(f"Finished Averaging at {time.time()}")
assert self.task.successful()
cls.celery_work.__exit__(None, None, None) takes about 9 seconds to complete which is not particularly wonderful....
I have been trying to execute a pipeline using celery. Initial task should create a list of items to process and I would use group to further parallelize each item processing. Finally I should collect results from group task.
#app.task()
def prepare():
return [item1, item2, item3]
#app.task()
def parallel_process(items, additional_param):
return group(process.s(i, additional_param) for i in items)() # I get an error kombu.exceptions.EncodeError: Object of type GroupResult is not JSON serializable
#app.task()
def process(i, param):
return mapping_func(item, param)
#app.task()
def collect(results):
print(results)
pipeline = prepare.s() | parallel_process.s(param) | collect.s()
pipeline.apply_async()
I get an error kombu.exceptions.EncodeError: Object of type GroupResult is not JSON serializable
Process task gets called, but collect task does not. Final result never comes. Is there any other way of doing this? Could not find appropriate example online.
According to your error message, you should change CELERY_RESULT_SERIALIZER to pickle, because GroupResult type is not JSON serializable.
Reference: https://docs.celeryproject.org/en/stable/userguide/configuration.html#result-serializer
I am trying to comprehend Task scheduling principles in Monix.
The following code (source: https://slides.com/avasil/fp-concurrency-scalamatsuri2019#/4/3) produces only '1's, as expected.
val s1: Scheduler = Scheduler(
ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor()),
ExecutionModel.SynchronousExecution)
def repeat(id: Int): Task[Unit] =
Task(println(s"$id ${Thread.currentThread().getName}")) >> repeat(id)
val prog: Task[(Unit, Unit)] = (repeat(1), repeat(2)).parTupled
prog.runToFuture(s1)
// Output:
// 1 pool-1-thread-1
// 1 pool-1-thread-1
// 1 pool-1-thread-1
// ...
When we add Task.sleep to the repeat method
def repeat(id: Int): Task[Unit] =
Task(println(s"$id ${Thread.currentThread().getName}")) >>
Task.sleep(1.millis) >> repeat(id)
the output changes to
// Output
// 1 pool-1-thread-1
// 2 pool-1-thread-1
// 1 pool-1-thread-1
// 2 pool-1-thread-1
// ...
Both tasks are now executed concurently on a single thread! Nice :)
Some cooperative yielding has kicked in. What happenend here exactly? Thanks :)
EDIT: same happens with Task.shift instead of Task.sleep.
I'm not sure if that's the answer you're looking for, but here it goes:
Allthough naming suggests otherwise, Task.sleep cannot be compared to more conventional methods like Thread.sleep.
Task.sleep does not actually run on a thread, but instead simply instructs the scheduler to run a callback after the elapsed time.
Here's a little code snippet from monix/TaskSleep.scala for comparison:
[...]
implicit val s = ctx.scheduler
val c = TaskConnectionRef()
ctx.connection.push(c.cancel)
c := ctx.scheduler.scheduleOnce(
timespan.length,
timespan.unit,
new SleepRunnable(ctx, cb)
)
[...]
private final class SleepRunnable(ctx: Context, cb: Callback[Throwable, Unit]) extends Runnable {
def run(): Unit = {
ctx.connection.pop()
// We had an async boundary, as we must reset the frame
ctx.frameRef.reset()
cb.onSuccess(())
}
}
[...]
During the period before the callback (here: cb) is executed, your single-threaded scheduler (here: ctx.scheduler) can simply use his thread for whatever computation is queued next.
This also explains why this approach is preferable, as we don't block threads during the sleep intervals - wasting less computation cycles.
Hope this helps.
To expand on Markus's answer.
As a mental model (for illustration purpose), you can imagine the thread pool like a stack. Since, you only have one executor thread pool, it'll try to run repeat1 first and then repeat2.
Internally, everything is just a giant FlatMap. The run loop will schedule all the tasks based on the execution model.
What happens is, sleep schedules a runnable to the thread pool. It pushes the runnable (repeat1) to the top of the stack, hence giving the chance for repeat2 to run. The same thing will happen with repeat2.
Note that, by default Monix's execution model will do an async boundary for every 1024 flatmap.
I'm looking for a way in which a Task (i.e. outer scope) could execute a "subTask" using flatMap or something equivalent and make sure that any subsequent chained calls in the outer scope use the original scheduler.
Libraries and scala used:
scala - 2.12.4
monix - "io.monix" %% "monix" % "3.0.0-RC1"
cats - "org.typelevel" %% "cats-core" % "1.0.1"
Example code:
import monix.eval.Task
import monix.execution.Scheduler
import scala.concurrent.Await
import scala.concurrent.duration.Duration
import monix.execution.Scheduler.Implicits.global
import cats.implicits._
object Test extends App {
val io1 = Scheduler.io("io1")
val io2 = Scheduler.io("io2")
def taskEval(name: String) = Task.eval(println(s"Running eval Task [$name] on thread [${Thread.currentThread().getName}]"))
def subTask: Task[Unit] = {
taskEval("subTaskScope").executeOn(io2)
}
def outerScope(sub: Task[Unit]): Task[Unit] = {
taskEval("outerScopeBefore") *> sub *> taskEval("outerScopeAfter")
}
def outerScopeTryProtect(sub: Task[Unit]): Task[Unit] = {
taskEval("outerScopeBefore") *> (sub <* Task.shift) *> taskEval("outerScopeAfter")
}
val program1 = taskEval("programBefore").executeOn(io1) *> outerScope(subTask) *> taskEval("programAfter")
val program2 = taskEval("programBefore").executeOn(io1) *> outerScopeTryProtect(subTask) *> taskEval("programAfter")
Await.result(program1.runAsync, Duration.Inf)
// Running eval Task [programBefore] on thread [io1-573]
// Running eval Task [outerScopeBefore] on thread [io1-573]
// Running eval Task [subTaskScope] on thread [io2-574]
// Running eval Task [outerScopeAfter] on thread [io2-574] // << we don't shift back so we are stuck with the scheduler that is forces by subTask
// Running eval Task [programAfter] on thread [io2-574]
println("------")
// Running eval Task [programBefore] on thread [io1-573]
// Running eval Task [outerScopeBefore] on thread [io1-573]
// Running eval Task [subTaskScope] on thread [io2-574]
// Running eval Task [outerScopeAfter] on thread [scala-execution-context-global-575] // we shift the scheduler but this restores the default scheduler
// Running eval Task [programAfter] on thread [scala-execution-context-global-575]
Await.result(program2.runAsync, Duration.Inf)
}
The subTask method wants to do some asynchronous work on a dedicated scheduler (io2) so it forces async boundary an scheduler using executeOn.
The outerScope method is being executed in some program program1 and it calls the sub (i.e. subTask) using flatMap. Since it does not to any explicit async boundary, if the subTask happens to change the scheduler (which it does), the rest of the outerScope will use the scheduler changed by the subTask. For this reason the call to taskEval("outerScopeAfter") is executed on the io2 scheduler.
The outerScopeTryProtect tries to protect the scheduler it uses by introducing an async boundary (using Task.shift) after the flatMapped sub (i.e. subTask). However, the async boundary (Task.shift) resets the scheduler to the default scheduler which will in this case go all the way back to the one used implicitly in program2.runAsync. This is not what we want, as we would like to be back to the scheduler that was used when calling taskEval("outerScopeBefore"), i.e. scheduler io1.
What I'm looking for is something like Task[A].flatMap[B](f: A => Task[B]): Task[B] that would execute the task produced by f in any way the f specifies (possibly using a different scheduler), but the resulting Task of the flatMap call will be back to the scheduler used by Task[A] before the flatMap.
I'm pretty sure this can only be done if I create my own task class, but I'd like to know if anyone else has found a way to do this.
Here is a full solution (works for Celery 4+):
import celery
from celery.task import task
class MyBaseClassForTask(celery.Task):
def on_failure(self, exc, task_id, args, kwargs, einfo):
# exc (Exception) - The exception raised by the task.
# args (Tuple) - Original arguments for the task that failed.
# kwargs (Dict) - Original keyword arguments for the task that failed.
print('{0!r} failed: {1!r}'.format(task_id, exc))
#task(name="foo:my_task", base=MyBaseClassForTask)
def add(x, y):
raise KeyError()
Resources:
http://docs.celeryproject.org/en/latest/userguide/tasks.html#task-inheritance
http://docs.celeryproject.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.on_failure
http://docs.celeryproject.org/en/latest/userguide/tasks.html#abstract-classes
You can provide the function directly to the decorator:
def fun(self, exc, task_id, args, kwargs, einfo):
print('Failed!')
#task(name="foo:my_task", on_failure=fun)
def add(x, y):
raise KeyError()