How do you test a celery chord in a django app from inside of pytest? - pytest

Using:
celery==5.2.7
django-celery-results==2.4.0
django==4.1
pytest==7.1.2
pytest-django==4.5.2
pytest-celery==0.0.0
I'm trying to test a task (start_task) that creates a chord (of N work_task tasks) with a callback task to summarize the work.
def test_function(db):
...
obj = make_obj
...
start_task.delay(obj)
I call start_task which creates a single work_task. The chord never
completes so that the summarize_task gets called. The work_task completes successfully (I can see that in the debugger). When I modify the test to:
def test_function(db, celery_app, celery_worker):
...
obj = make_obj
...
start_task.delay(obj)
The test dies on make_obj because the db connection is already closed.
E psycopg2.InterfaceError: connection already closed
My work around for the moment is to manually call tasks so that celery is not involved, but this does not test the chord mechanisms, only the logic that is invoked by the chord.
If someone has an example

It can be done using UnitTest style tests with pytest. I haven't solved this using native pytest yet. The secret sauce below is to use a TransactionTestCase vs a normal Django TestCase
from django.test import TransactionTestCase, override_settings
#pytest.mark.xdist_group(name="celery")
#override_settings(CELERY_TASK_ALWAYS_EAGER=False)
#override_settings(CELERY_TASK_EAGER_PROPAGATES=False)
class SyncTaskTestCase2(TransactionTestCase):
#classmethod
def setUpClass(cls):
super().setUpClass()
cls.celery_worker = start_worker(app, perform_ping_check=False)
cls.celery_worker.__enter__()
print(f"Celery Worker started {time.time()}")
#classmethod
def tearDownClass(cls):
print(f"Tearing down Superclass {time.time()}")
super().tearDownClass()
print(f"Tore down Superclass {time.time()}")
cls.celery_worker.__exit__(None, None, None)
print(f"Celery Worker torn down {time.time()}")
def test_success(self):
print(f"Starting test at {time.time()}")
self.task = do_average_in_chord.delay()
self.task.get()
print(f"Finished Averaging at {time.time()}")
assert self.task.successful()
cls.celery_work.__exit__(None, None, None) takes about 9 seconds to complete which is not particularly wonderful....

Related

Pytest setup class once before testing

I'm using pytest for testing with mixer library for generating model data. So, now I'm trying to setup my tests once before they run. I grouped them into TestClasses, set to my fixtures 'class' scope, but this doesn't work for me.
#pytest.mark.django_db
class TestCreateTagModel:
#classmethod
#pytest.fixture(autouse=True, scope='class')
def _set_up(cls, create_model_instance, tag_model, create_fake_instance):
cls.model = tag_model
cls.tag = create_model_instance(cls.model)
cls.fake_instance = create_fake_instance(cls.model)
print('setup')
def test_create_tag(self, tag_model, create_model_instance, check_instance_exist):
tag = create_model_instance(tag_model)
assert check_instance_exist(tag_model, tag.id)
conftest.py
pytest.fixture(scope='class')
#pytest.mark.django_db(transaction=True)
def create_model_instance():
instance = None
def wrapper(model, **fields):
nonlocal instance
if not fields:
instance = mixer.blend(model)
else:
instance = mixer.blend(model, **fields)
return instance
yield wrapper
if instance:
instance.delete()
#pytest.fixture(scope='class')
#pytest.mark.django_db(transaction=True)
def create_fake_instance(create_related_fields):
"""
Function for creating fake instance of model(fake means that this instance doesn't exists in DB)
Args:
related (bool, optional): Flag which indicates create related objects or not. Defaults to False.
"""
instance = None
def wrapper(model, related=False, **fields):
with mixer.ctx(commit=False):
instance = mixer.blend(model, **fields)
if related:
create_related_fields(instance, **fields)
return instance
yield wrapper
if instance:
instance.delete()
#pytest.fixture(scope='class')
#pytest.mark.django_db(transaction=True)
def create_related_fields():
django_rel_types = ['ForeignKey']
def wrapper(instance, **fields):
for f in instance._meta.get_fields():
if type(f).__name__ in django_rel_types:
rel_instance = mixer.blend(f.related_model)
setattr(instance, f.name, rel_instance)
return wrapper
But I'm catching exception in mixer gen_value method: Database access not allowed, use django_db mark(that I'm already use). Do you have any ideas how this can be implemented?
You can set things up once before a run by returning the results of the setup, rather than modifying the testing class directly. From my own attempts, it seems any changes to the class made within class-scope fixtures are lost when individual tests are run. So here's how you should be able to do this. Replace your _setup fixture with these:
#pytest.fixture(scope='class')
def model_instance(self, tag_model, create_model_instance):
return create_model_instance(tag_model)
#pytest.fixture(scope='class')
def fake_instance(self, tag_model, create_fake_instance):
return create_fake_instance(tag_model)
And then these can be accessed through:
def test_something(self, model_instance, fake_instance):
# Check that model_instance and fake_instance are as expected
I'm not familiar with Django myself though, so there might be something else with it going on. This should at least help you solve one half of the problem, if not the other.

Is synchronous HTTP request wrapped in a Future considered CPU or IO bound?

Consider the following two snippets where first wraps scalaj-http requests with Future, whilst second uses async-http-client
Sync client wrapped with Future using global EC
object SyncClientWithFuture {
def main(args: Array[String]): Unit = {
import scala.concurrent.ExecutionContext.Implicits.global
import scalaj.http.Http
val delay = "3000"
val slowApi = s"http://slowwly.robertomurray.co.uk/delay/${delay}/url/https://www.google.co.uk"
val nestedF = Future(Http(slowApi).asString).flatMap { _ =>
Future.sequence(List(
Future(Http(slowApi).asString),
Future(Http(slowApi).asString),
Future(Http(slowApi).asString)
))
}
time { Await.result(nestedF, Inf) }
}
}
Async client using global EC
object AsyncClient {
def main(args: Array[String]): Unit = {
import scala.concurrent.ExecutionContext.Implicits.global
import sttp.client._
import sttp.client.asynchttpclient.future.AsyncHttpClientFutureBackend
implicit val sttpBackend = AsyncHttpClientFutureBackend()
val delay = "3000"
val slowApi = uri"http://slowwly.robertomurray.co.uk/delay/${delay}/url/https://www.google.co.uk"
val nestedF = basicRequest.get(slowApi).send().flatMap { _ =>
Future.sequence(List(
basicRequest.get(slowApi).send(),
basicRequest.get(slowApi).send(),
basicRequest.get(slowApi).send()
))
}
time { Await.result(nestedF, Inf) }
}
}
The snippets are using
Slowwly to simulate slow API
scalaj-http
async-http-client sttp backend
time
The former takes 12 seconds whilst the latter takes 6 seconds. It seems the former behaves as if it is CPU bound however I do not see how that is the case since Future#sequence should executes the HTTP requests in parallel? Why does synchronous client wrapped in Future behave differently from proper async client? Is it not the case that async client does the same kind of thing where it wraps calls in Futures under the hood?
Future#sequence should execute the HTTP requests in parallel?
First of all, Future#sequence doesn't execute anything. It just produces a future that completes when all parameters complete.
Evaluation (execution) of constructed futures starts immediately If there is a free thread in the EC. Otherwise, it simply submits it for a sort of queue.
I am sure that in the first case you have single thread execution of futures.
println(scala.concurrent.ExecutionContext.Implicits.global) -> parallelism = 6
Don't know why it is like this, it might that other 5 thread is always busy for some reason. You can experiment with explicitly created new EC with 5-10 threads.
The difference with the Async case that you don't create a future by yourself, it is provided by the library, that internally don't block the thread. It starts the async process, "subscribes" for a result, and returns the future, which completes when the result will come.
Actually, async lib could have another EC internally, but I doubt.
Btw, Futures are not supposed to contain slow/io/blocking evaluations without blocking. Otherwise, you potentially will block the main thread pool (EC) and your app will be completely frozen.

Retrying Monix Task - why Task.defer is required here?

I recently spotted a case I can't fully understand while working with Monix Task:
There are two functions (in queue msg handler):
def handle(msg: RollbackMsg): Task[Unit] = {
logger.info(s"Attempting to rollback transaction ${msg.lockId}")
Task.defer(doRollback(msg)).onErrorRestart(5).foreachL { _ =>
logger.info(s"Transaction ${msg.lockId} rolled back")
}
}
private def doRollback(msg: RollbackMsg): Task[Unit] =
(for {
originalLock <- findOrigLock(msg.lockId)
existingClearanceOpt <- findExistingClearance(originalLock)
_ <- clearLock(originalLock, existingClearanceOpt)
} yield ()).transact(xa)
The internals of doRollback's for-comprehension are all a set of doobie calls returning ConnectionIO[_] monad and then transact is run on it turning the composition into Monix Task.
Now, as seen in handle function I'd like entire process to retry 5 times in case of failure. The mysterious part is that this simple call:
doRollback(msg).onErrorRestart(5)
doesn't really restart the operation on exception (verified in tests). In order to get this retry behaviour I have to explicitly wrap it in Task.defer, or have it already within Task "context" in any other way.
And this is the point I don't fully get: why is it so? doRollback already gives me Task instance, so I should be able to call onErrorRestart on it, no? If it's not the case how can I be sure that a Task instance i get from "somewhere" is ok to be restarted or not?
What am I missing here?

Testing rx-observables from Futures/Iterables

I have:
val observable: Observable[Int] = Observable.from(List(5))
and I can test that the input list is indeed passed on to the observable by testing:
materializeValues(observable) should contain (5)
where materializeValues is:
def materializeValues[T](observable: Observable[T]): List[T] = {
observable.toBlocking.toIterable.toList
}
Now, if I create an observable from a future, I can't seem to use materializeValues for the test as the test times out. So if I have:
val futVal = Future.successful(5)
val observable: Observable[Int] = Observable.from(futVal)
materializeValues(observable) should contain(5)
it times out and does not pass the test. What is different in the process of materializing these two observables, which leads to me not being able to block on it?
Also, what is the idomatic way of testing an observable? Is there any way of doing it without calling toBlocking?
I think the problem is that you use AsyncWordSpecLike (by the way why AsyncWordSpecLike instead of AsyncWordSpec?). AsyncWordSpecLike/AsyncWordSpec are designed to simplify testing Future. Unfortunately Observable is a more powerful abstraction that can't be easily mapped onto a Future.
Particularly AsyncWordSpecLike/AsyncWordSpec allow your tests to return Future[Assertion]. To make it possible it provides custom implicit ExecutionContext that it can force to execute everything and know when all scheduled jobs have finished. However the same custom ExecutionContext is the reason why your second code doesn't work: processing of the scheduled jobs starts only after execution of your test code has finished but your code blocks on the futVal because unlucklily for you callback registered in Future.onComplete is scheduled to be run on the ExecutionContext. It means that you have a kind of dead-lock with your own thread.
I'm not sure what is the official way to test Observable on Scala. In Java I think TestSubscriber is the suggested tool. As I said Observable is fundamentally more powerful thing than Future so I think to test Observable you should avoid using AsyncWordSpecLike/AsyncWordSpec. If you switch to use FlatSpec or WordSpec, you can do something like this:
class MyObservableTestSpec extends WordSpec with Matchers {
import scala.concurrent.ExecutionContext.Implicits.global
val testValue = 5
"observables" should {
"be testable if created from futures" in {
val futVal = Future.successful(testValue)
val observable = Observable.from(futVal)
val subscriber = TestSubscriber[Int]()
observable(subscriber)
subscriber.awaitTerminalEvent
// now after awaitTerminalEvent you can use various subscriber.assertXyz methods
subscriber.assertNoErrors
subscriber.assertValues(testValue)
// or you can use Matchers as
subscriber.getOnNextEvents should contain(testValue)
}
}
}

Sleeping actors?

What's the best way to have an actor sleep? I have actors set up as agents which want to maintain different parts of a database (including getting data from external sources). For a number of reasons (including not overloading the database or communications and general load issues), I want the actors to sleep between each operation. I'm looking at something like 10 actor objects.
The actors will run pretty much infinitely, as there will always be new data coming in, or sitting in a table waiting to be propagated to other parts of the database etc. The idea is for the database to be as complete as possible at any point in time.
I could do this with an infinite loop, and a sleep at the end of each loop, but according to http://www.scala-lang.org/node/242 actors use a thread pool which is expanded whenever all threads are blocked. So I imagine a Thread.sleep in each actor would be a bad idea as would waste threads unnecessarily.
I could perhaps have a central actor with its own loop that sends messages to subscribers on a clock (like async event clock observers)?
Has anyone done anything similar or have any suggestions? Sorry for extra (perhaps superfluous) information.
Cheers
Joe
There was a good point to Erlang in the first answer, but it seems disappeared. You can do the same Erlang-like trick with Scala actors easily. E.g. let's create a scheduler that does not use threads:
import actors.{Actor,TIMEOUT}
def scheduler(time: Long)(f: => Unit) = {
def fixedRateLoop {
Actor.reactWithin(time) {
case TIMEOUT => f; fixedRateLoop
case 'stop =>
}
}
Actor.actor(fixedRateLoop)
}
And let's test it (I did it right in Scala REPL) using a test client actor:
case class Ping(t: Long)
import Actor._
val test = actor { loop {
receiveWithin(3000) {
case Ping(t) => println(t/1000)
case TIMEOUT => println("TIMEOUT")
case 'stop => exit
}
} }
Run the scheduler:
import compat.Platform.currentTime
val sched = scheduler(2000) { test ! Ping(currentTime) }
and you will see something like this
scala> 1249383399
1249383401
1249383403
1249383405
1249383407
which means our scheduler sends a message every 2 seconds as expected. Let's stop the scheduler:
sched ! 'stop
the test client will begin to report timeouts:
scala> TIMEOUT
TIMEOUT
TIMEOUT
stop it as well:
test ! 'stop
There's no need to explicitly cause an actor to sleep: using loop and react for each actor means that the underlying thread pool will have waiting threads whilst there are no messages for the actors to process.
In the case that you want to schedule events for your actors to process, this is pretty easy using a single-threaded scheduler from the java.util.concurrent utilities:
object Scheduler {
import java.util.concurrent.Executors
import scala.compat.Platform
import java.util.concurrent.TimeUnit
private lazy val sched = Executors.newSingleThreadScheduledExecutor();
def schedule(f: => Unit, time: Long) {
sched.schedule(new Runnable {
def run = f
}, time , TimeUnit.MILLISECONDS);
}
}
You could extend this to take periodic tasks and it might be used thus:
val execTime = //...
Scheduler.schedule( { Actor.actor { target ! message }; () }, execTime)
Your target actor will then simply need to implement an appropriate react loop to process the given message. There is no need for you to have any actor sleep.
ActorPing (Apache License) from lift-util has schedule and scheduleAtFixedRate Source: ActorPing.scala
From scaladoc:
The ActorPing object schedules an actor to be ping-ed with a given message at specific intervals. The schedule methods return a ScheduledFuture object which can be cancelled if necessary
There unfortunately are two errors in the answer of oxbow_lakes.
One is a simple declaration mistake (long time vs time: Long), but the second is some more subtle.
oxbow_lakes declares run as
def run = actors.Scheduler.execute(f)
This however leads to messages disappearing from time to time. That is: they are scheduled but get never send. Declaring run as
def run = f
fixed it for me. It's done the exact way in the ActorPing of lift-util.
The whole scheduler code becomes:
object Scheduler {
private lazy val sched = Executors.newSingleThreadedScheduledExecutor();
def schedule(f: => Unit, time: Long) {
sched.schedule(new Runnable {
def run = f
}, time - Platform.currentTime, TimeUnit.MILLISECONDS);
}
}
I tried to edit oxbow_lakes post, but could not save it (broken?), not do I have rights to comment, yet. Therefore a new post.