asyncio: 50ms delay between put_nowait and get to/from a queue - queue

I have a python asyncio application, with several coroutines being run within a single thread. Some data is being passed using queues.
The queue consumer looks like this:
async def queue_consumer(q):
"""Consume from an asyncio.Queue, making it an async iterable"""
while True:
try:
e = await q.get()
yield e
except:
continue
the consumer is being pulled from with async for.
In this particular case the coroutine which consumes from a specific queue sequentially calls some code which puts data into its queue with put_nowait.
EDIT: In this particular case coroutine A, which is listening for inbound network traffic, puts message into queue of coroutine B.
I have noticed that there is a consistent ~50ms delay between a call to put_nowait in coroutine A and then the data being processed as a result of pulling it from queue async iterable in coroutine B.
I suspect it might have something to do with some asyncio internal polling resolution, but I am not sure, not I would suspect where such configuration could be modified.
I would be very much interested in increasing event polling frequency in the asyncio loop, hence, decreasing observed delay between put_nowait and get to/from a queue between coroutines. Maybe also there's a way to hint asyncio framework to process items from the queue earlier?
NB: the application I am working with is not doing any computationally demanding work.

It turns out the problem was caused by my app doing some UI updates with prompt_toolkit. I tracked this down by placing some measurements within _run_once. Anyway, the queue was not being processed because the event loop was busy executing some UI code that I did not expect to take so much time.

Related

Scala's Future and ExecutionContext Execution

Let's say I have the following set of code that does something in a Future:
1 to 10 foreach {
case x => Future { x + x }
}
Assuming that I give the default ExecutionContext to this piece of code, I know what happens in the background, but what I want to know is how is the handling of the Future actually done? I mean there should be some thread or a set of threads that should potentially be waiting for the Future to finish? Are these threads blocked? blocked in the sense where they are literally waiting for the Future to finish?
Now in the following scenario:
val x: Future[MyType] = finishInSomeFuture()
Assuming that x has a timeout that I can call like this:
Future {
blocking {
x.get(3, TimeOut.SECONDS)
}
}
Am I really blocking? Is there a better way to timeout asynchronously?
EDIT: How different or how better is the following Timeout better than the blocking context that I defined above?
object TimeoutFuture {
def apply[A](timeout: FiniteDuration)(block: => A): Future[A] = {
val prom = promise[A]
// timeout logic
Akka.system.scheduler.scheduleOnce(timeout) {
prom tryFailure new java.util.concurrent.TimeoutException
}
// business logic
Future {
prom success block
}
prom.future
}
}
Let's say I have the following set of code that does something in a Future:
1 to 10 foreach {
case x => Future { x + x }
}
...
Your piece of code creates ten Futures that are immediately set for execution using threads provided by implicit ExecutionContext. As you don't store the references to your futures, and don't await for their execution, your main thread (where your foreach is defined) doesn't block and continues its execution immediately. If that piece of code was in the end of the main method, then, depending on whether ThreadFactory in ExecutionContext produced daemon threads program may exit without waiting on Futures to finish.
Now in the following scenario:
val x: Future[MyType] = finishInSomeFuture()
Assuming that x has a timeout that I can call like this:
Future {
blocking {
x.get(3, TimeOut.SECONDS)
}
}
Am I really blocking? Is there a better way to timeout asynchronously?
You probably meant Await.result instead of x.get:
def inefficientTimeoutFuture[T](f:Future[T], x:Duration) = Future { Await.result(f, x) }
In this case future f will be calculate in separate thread, while additional thread will be blocked waiting for the calculation of f.
Using scheduler to create TimeoutFuture is more efficient, as schedulers usually share fixed amount of threads (often one), while blocking in Await.result will always require additional thread to block.
I would like to know how I could timeout without blocking?
Using scheduler to create TimeoutFuture allows you to timeout operation without blocking. You are wrapping your Future in timeout helper, and the new Future either completes successfully or fails due to timeout (whatever comes first). The new Future has the same asynchronous nature and it's up to you how to use it (register onComplete callbacks or synchronously wait for result, blocking main thread).
UPD I'll try to clarify some fundamental things about multithreading and blocking.
Right now asynchronous non-blocking approach is the trend, but you have to understand what blocking means and why it should be avoided.
Each thread in Java comes at the cost. First, it's relatively expensive to create new Thread (that's why thread pools exist) and second, it consumes memory. Why not CPU? Because your CPU resources are limited by the number of cores you have. It doesn't matter how many active threads you have, your parallelism level will always be capped by number of cores. And if thread is inactive (blocked) it doesn't consume CPU.
In contemporary java applications you can create fairly large number of threads (thousands of them). The problem is that in some cases you can't predict how many threads you're gonna need. That's when asynchronous approach comes into play. It says: instead of blocking current thread while some other thread(s) do their job let's wrap our next steps in callback and return current thread to the pool, so it can do some other useful work. So almost all threads are busy doing actual work instead of just waiting and consuming memory.
Now to the example of the timer. If you use netty-based HashedWheelTimer you can have it backed by single thread and have thousands of events scheduled. When you create Future that is blocked waiting for timeout you occupy one thread per "schedule". So if you have thousand timeouts scheduled, you'll end up with thousand blocked thread (which again consume memory, not cpu).
Now your "main" future (that you want to wrap in timeout) doesn't have to block the thread either. For example, if you perform synchronous http request inside the future, your thread will be blocked, but if you use netty-based AsyncHttpClient (for example), you can use a promise-based future that doesn't occupy the thread. And in this case you can have small fixed number of threads that process any number of requests (hundreds of thousands).
UPD2
But there should be some thread that should be blocking even in case of the Timer as it has to wait for the Timeout millis. So what and where is the benefit? I still block, but may be I block less in the Timer case or?
This is true only for one particular scenario: when you have main thread that waits for asynchronous task to complete. In this case you're right, there is no way to wrap operation in timeout without blocking main thread. And it doesn't make any sense to use Timers in this case. You just need additional thread to perform your operation, while main thread waits for result or timeout.
But usually Futures are used in more complex scenarios, where there is no "main" thread. For example, imagine asynchronous webserver, request comes in, you create Future to process it and register callback to reply. No "main" thread to wait for anything.
Or another example, you want to make 1000 requests to external service with individual timeouts and then gather all results in one place. If you have asynchronous client for that service, you create that 1000 requests, wrap them in asynchronous timeouts and then combine into one Future. You can block main thread to wait for that future to complete or register callback to print result, but you don't have to create 1000 threads just to wait for each individual request to complete.
So, the point is: if you already have synchronous flow and you want to wrap some part of it in timeout, the only thing you can do is to have your current thread blocked until other thread(s) perform the job.
If you want to avoid blocking, you need to use asynchronous approach from the start.

Where to put calculation executed regularly that updates actor's internal state?

I am learning Scala and Akka.
In the problem I am trying to solve I want an actor to be reading a real-time data stream and perform a certain calculation that would update its state.
Every 3 seconds I am sending a request through a Scheduler for the actor to return to its state.
While I have pretty much everything implemented, with my actor having a broadcaster and receiver and the function to update the state right. I am not entirely sure how to do it, I could potentially put the calculations always running in a separate thread inside the actor but I would like to now if there is a more elegant way to make this in scala.
I would suggest to divide the work between two actors. The parent actor would manage child worker actor and would track the state. It sends a message to the child worker actor to trigger data processing.
The child worker actor processes the data stream - don't forget to wrap the processing into a Future so that it doesn't block the actor from processing messages. It also periodically sends messages to the master with current state. So the child worker is stateless, it sends notifications when its state changes.
If you want to know the current state of the work overall, you ask the master. In principle, you can merge this into one actor which sends the status message to itself. I wouldn't update the state directly to avoid concurrency issues. The reason is that the data processing work running in the Future can possible run on a different thread than message processing.

How to retry hot observable?

Rx has great function Observable.Buffer. But there is a problem with it in real life.
Scenario: application sends a stream of events to a database. Inserting events one-by-one is expensive, so we need to batch it. I want to use Observable.Buffer for this. But inserting into DB has small probability of failure (deadlocks, timeouts, downtime, etc).
I can add some retry logic into batching function itself, but it would be against Rx idea of composablility. Observable.Retry does not cut it, because it will re-subscribe to "hot" source, which means that failed batch will be lost.
Are there functions, which I can compose to achieve desired effect, or do I need to implement my own extension? I would like something like this:
_inputBuffer = new BufferBlock<int>();
_inputBuffer.AsObservable().
Buffer(TimeSpan.FromSeconds(10), 1000).
Do(batch => SqlSaveBatch(batch)).
{Retry???}.
Subscribe()
To make it perfect, I would like to be able to get control over situation when OnComplete is called, while retry buffer has incomplete batches, and be able to perform some actions (send error email, save data to local file system, etc.)
When a save to database fails and needs to be retried, it's not really the stream or the events that are in error, it's a action taken against an event.
I would structure your code more like this:
IDisposable subscription =
_inputBuffer.AsObservable().
Buffer(TimeSpan.FromSeconds(10), 1000).
Subscribe(
batch => SqlSaveBatchWithRetryLogic(batch),
() => YourOnCompleteAction);
You can provide the retry logic inside of SqlSaveBatchWithRetryLogic()
Handle OnComplete of the events inside YourOnCompleteAction()
You can elect to dispose the subscription from within SqlSaveBatchWithRetryLogic() if you fail to save a batch.
This also removes the Do side effect.
I would be careful about this approach though - you need to watch the retry logic. You have no back-pressure (way to slow down the input). So if you have any kind of back-off/retry you are risking the queue backing up and filling memory. If you start seeing batches consistently at the count limit, you are probably in trouble! You may want to implement a counter to monitor the outstanding items.

Rx -several producers/one consumer

Have been trying to google this but getting a bit stuck.
Let's say we have a class that fires an event, and that event could be fired by several threads at the same time.
Using Observable.FromEventPattern, we create an Observable, and subscribe to that event. How exactly does Rx manage multiple those events being fired at once? Let's say we have 3 events fired in quick succession on different threads. Does it queue them internally, and then call the Subscribe delegate synchronously for each one? Let's say we were subscribing on a thread pool, can we still guarantee the Subscriptions would be processed separately in time?
Following on from that, let's say for each event, we want to perform an action, but it's a method that's potentially not thread safe, so we only want one thread to be in this method at a time. Now I see we can use an EventLoop Scheduler, and presumably we wouldn't need to implement any locking on the code?
Also, would observing on the Current Thread be an option? Is Current Thread the thread that the event was fired from, or the event the subscription was set up on? i.e. Is that current thread guaranteed to always be the same or could be have 2 threads running ending up in the method at the same time?
Thx
PS: I put an example together but I always seem to end up on the samethread in my subscrive method, even when I ObserveOn the threadpool, which is confusing :S
PSS: From doing a few more experiments, it seems that if no Schedulers are specified, then RX will just execute on whatever thread the event was fired on, meaning it processes several concurrently. As soon as I introduce a scheduler, it always runs things consecutively, no matter what the type of the scheduler is. Strange :S
According to the Rx Design Guidelines, an observable should never call OnNext of an observer concurrently. It will always wait for the current call to complete before making the next call. All Rx methods honor this convention. And, more importantly, they assume you also honor this convention. When you violate this condition, you may encounter subtle bugs in the behavior of your Observable.
For those times when you have source data that does not honor this convention (ie it can produce data concurrently), they provide Synchronize.
Observable.FromEventPattern assumes you will not be firing concurrent events and so does nothing to prevent concurrent downstream notifications. If you plan on firing events from multiple threads, sometimes concurrently, then use Synchronize() as the first operation you do after FromEventPattern:
// this will get you in trouble if your event source might fire events concurrently.
var events = Observable.FromEventPattern(...).Select(...).GroupBy(...);
// this version will protect you in that case.
var events = Observable.FromEventPattern(...).Synchronize().Select(...).GroupBy(...);
Now all of the downstream operators (and eventually your observer) are protected from concurrent notifications, as promised by the Rx Design Guidelines. Synchronize works by using a simple mutex (aka the lock statement). There is no fancy queueing or anything. If one thread attempts to raise an event while another thread is already raising it, the 2nd thread will block until the first thread finishes.
In addition to the recommendation to use Synchronize, it's probably worth having a read of the Intro to Rx section on scheduling and threading. It Covers the different schedulers and their relationship to threads, as well as the differences between ObserveOn and SubscribeOn, etc.
If you have several producers then there are RX methods for combining them in a threadsafe way
For combining streams of the same type of event into a single stream
Observable.Merge
For combining stream of different types of events into a single stream using a selector to transform the latest value on each stream into a new value.
Observable.CombineLatest
For example combining stock prices from different sources
IObservable<StockPrice> source0;
IObservable<StockPrice> source1;
IObservable<StockPrice> combinedSources = source0.Merge(source1);
or create balloons at the current position every time there is a click
IObservable<ClickEvent> clicks;
IObservable<Position> position;
IObservable<Balloons> balloons = clicks
.CombineLatest
( positions
, (click,position)=>new Balloon(position.X, position.Y)
);
To make this specifically relevant to your question you say there is a class which combines events from different threads. Then I would use Observable.Merge to combine the individual event sources and expose that as an Observable on your main class.
BTW if your threads are actually tasks that are firing events to say they have completed here is an interesting patterns
IObservable<Job> jobSource;
IObservable<IObservable<JobResult>> resultTasks = jobSource
.Select(job=>Observable.FromAsync(cancelationToken=>DoJob(token,job)));
IObservable<JobResult> results = resultTasks.Merge();
Where what is happening is you are getting a stream of jobs in. From the jobs you are creating a stream of asynchronous tasks ( not running yet ). Merge then runs the tasks and collects the results. It is an example of a mapreduce algorithm. The cancellation token can be used to cancel running async tasks if the observable is unsubscribed from (ie canceled )

How does I/O work in Akka?

How does the actor model (in Akka) work when you need to perform I/O (ie. a database operation)?
It is my understanding that a blocking operation will throw an exception (and essentially ruin all concurrency due to the evented nature of Netty, which Akka uses). Hence I would have to use a Future or something similar - however I don't understand the concurrency model.
Can 1 actor be processing multiple message simultaneously?
If an actor makes a blocking call in a future (ie. future.get()) does that block only the current actor's execution; or will it prevent execution on all actors until the blocking call has completed?
If it blocks all execution, how does using a future assist concurrency (ie. wouldn't invoking blocking calls in a future still amount to creating an actor and executing the blocking call)?
What is the best way to deal with a multi-staged process (ie. read from the database; call a blocking webservice; read from the database; write to the database) where each step is dependent on the last?
The basic context is this:
I'm using a Websocket server which will maintain thousands of sessions.
Each session has some state (ie. authentication details, etc);
The Javascript client will send a JSON-RPC message to the server, which will pass it to the appropriate session actor, which will execute it and return a result.
Execution of the RPC call will involve some I/O and blocking calls.
There will be a large number of concurrent requests (each user will be making a significant amount of requests over the WebSocket connection and there will be a lot of users).
Is there a better way to achieve this?
Blocking operations do not throw exceptions in Akka. You can do blocking calls from an Actor (which you probably want to minimize, but thats another story).
no, 1 actor instance cannot.
It will not block any other actors. You can influence this by using a specific Dispatcher. Futures use the default dispatcher (the global event driven one normally) so it runs on a thread in a pool. You can choose which dispatcher you want to use for your actors (per actor, or for all). I guess if you really wanted to create a problem you might be able to pass exactly the same (thread based) dispatcher to futures and actors, but that would take some intent from your part. I guess if you have a huge number of futures blocking indefinitely and the executorservice has been configured to a fixed amount of threads, you could blow up the executorservice. So a lot of 'ifs'. a f.get blocks only if the Future has not completed yet. It will block the 'current thread' of the Actor from which you call it (if you call it from an Actor, which is not necessary by the way)
you do not necessarily have to block. you can use a callback instead of f.get. You can even compose Futures without blocking. check out talk by Viktor on 'the promising future of akka' for more details: http://skillsmatter.com/podcast/scala/talk-by-viktor-klang
I would use async communication between the steps (if the steps are meaningful processes on their own), so use an actor for every step, where every actor sends a oneway message to the next, possibly also oneway messages to some other actor that will not block which can supervise the process. This way you could create chains of actors, of which you could make many, in front of it you could put a load balancing actor, so that if one actor blocks in one chain another of the same type might not in the other chain. That would also work for your 'context' question, pass of workload to local actors, chain them up behind a load balancing actor.
As for netty (and I assume you mean Remote Actors, because this is the only thing that netty is used for in Akka), pass of your work as soon as possible to a local actor or a future (with callback) if you are worried about timing or preventing netty to do it's job in some way.
Blocking operations will generally not throw exceptions, but waiting on a future (for example by using !! or !!! send methods) can throw a time out exception. That's why you should stick with fire-and-forget as much as possible, use a meaningful time-out value and prefer callbacks when possible.
An akka actor cannot explicitly process several messages in a row, but you can play with the throughput value via the config file. The actor will then process several message (i.e. its receive method will be called several times sequentially) if its message queue it's not empty: http://akka.io/docs/akka/1.1.3/scala/dispatchers.html#id5
Blocking operations inside an actor will not "block" all actors, but if you share threads among actors (recommended usage), one of the threads of the dispatcher will be blocked until operations resume. So try composing futures as much as possible and beware of the time-out value).
3 and 4. I agree with Raymond answers.
What Raymond and paradigmatic said, but also, if you want to avoid starving the thread pool, you should wrap any blocking operations in scala.concurrent.blocking.
It's of course best to avoid blocking operations, but sometimes you need to use a library that blocks. If you wrap said code in blocking, it will let the execution context know you may be blocking this thread so it can allocate another one if needed.
The problem is worse than paradigmatic describes since if you have several blocking operations you may end up blocking all threads in the thread pool and have no free threads. You could end up with deadlock if all your threads are blocked on something that won't happen until another actor/future gets scheduled.
Here's an example:
import scala.concurrent.blocking
...
Future {
val image = blocking { load_image_from_potentially_slow_media() }
val enhanced = image.enhance()
blocking {
if (oracle.queryBetter(image, enhanced)) {
write_new_image(enhanced)
}
}
enhanced
}
Documentation is here.