Background task in reactive pipeline (Fire-and-forget) - reactive-programming

I have a reactive pipeline to process incoming requests. For each request I need to call a business-relevant function (doSomeRelevantProcessing).
After that is done, I need to notify some external service about what happened. That part of the pipeline should not increase the overall response time.
Also, notifying this external system is not business critical: giving a quick response after the main part of the pipeline is finished is more important than making sure the notification is successful.
As far as I learned, the only way to run something in the background without slowing down the overall process is to subscribe to in directly in the pipeline, thus achieving a fire-and-forget mentality.
Is there a good alternative to subscribing inside the flatmap?
I am a little worried about what might happen if notifying the external service takes longer than the original processing and a lot of requests are coming in at once. Could this lead to a memory exhaustion or the overall process to block?
fun runPipeline(incoming: Mono<Request>) = incoming
.flatMap { doSomeRelevantProcessing(it) } // this should not be delayed
.flatMap { doBackgroundJob(it) } // this can take a moment, but is not super critical
fun doSomeRelevantProcessing(request: Request) = Mono.just(request) // do some processing
fun doBackgroundJob(request: Request) = Mono.deferContextual { ctx: ContextView ->
val notification = "notification" // build an object from context
// this uses non-blocking HTTP (i.e. webclient), so it can take a second or so
notifyExternalService(notification).subscribeOn(Schedulers.boundedElastic()).subscribe()
Mono.just(Unit)
}
fun notifyExternalService(notification: String) = Mono.just(Unit) // might take a while

I'm answering this assuming that you notify the external service using purely reactive mechanisms - i.e. you're not wrapping a blocking service. If you are then the answer would be different as you're bound by the size of your bounded elastic thread pool, which could quickly become overwhelmed if you have hundreds of requests a second incoming.
(Assuming you're using reactive mechanisms, then there's no need for .subscribeOn(Schedulers.boundedElastic()) as you give in your example, as that's not buying you anything - it's designed for wrapping legacy blocking services.)
Could this lead to a memory exhaustion
It's only a possibility in really extreme cases, the memory used by each individual request will be tiny. It's almost certainly not worth worrying about, if you start seeing memory issues here then you'll almost certainly be hit by other issues elsewhere.
That being said, I'd probably recommend adding .timeout(Duration.ofSeconds(5)) or similar before your inner subscribe method to make sure the requests are killed off after a while if they haven't worked for any reason - this will prevent them building up.
...or [can this cause] the overall process to block?
This one is easier - a short no, it can't.

Related

How to choose in which ExecutorService the play ws is executed?

1)
I would like to execute specific Play WS in an isolated thread pool. Because I'm going to have a lot of HTTP call to do in background and I don't want that it overload my main executor service.
Note : I also found an information that I don't understand here : https://groups.google.com/forum/#!topic/play-framework/ibDf2vL3sT0
It explains that PlayWs already have it owns thread pool. Is it still right in Play 2.6 ? I don't understand things like that when reading the play documentation (see : https://www.playframework.com/documentation/2.5.x/ThreadPools#Using-the-default-thread-pool)
I created my own context :
call-to-db-context {
fork-join-executor {
parallelism-factor = 1
parallelism-max = 24
}
}
But I don't know how to specify that ws request use this context.
ws.url("http://127.0.0.1:8080/b")
.get() // How to specify executorContext here ?
2)
Also, this call-to-db-context must have a low priority because it's background task. I would like that Akka handle user request have higher priority and my default executorContext too. What is the best way to do it ?
In early play it was easier, you could just configure client yourself (as soon as WS was wrapper on top of ning).
It looks bit more complicated in 2.6. In mature products when something is not easy to change, than, most probably, you do not need to change it.
So, I do not think that you need to specify thread pool for WS methods. For post-processing may be, if it is long. But play client is asynchronous, means, that it will not block thread while wait for response. If you do requests using unreliable network, just use timeouts.
See here more about play client
Not sure that understand your requirement about priority. Did you know about akka deployment configuration? If no, you need to read here
So, you can specify thread pools for your actors. Having different dispatchers for actors and ws post-processing (processing of data from DB?) you will separate these functionalities. If ws-calls postprocessing is heavy, limit amount of threads for your call-to-db-context dispatcher
Update after comment
Lots of ws-calls in your case (if you think that their amount could affect perormance) need to be limited in two places: limit amount of calls started, limit amount of concurrent post-processing. You need to understand. Setting specific dispatcher for ws itself will not limit anything: as soon as it is async, it can start 1000 requests having only one thread.
So, I would say you can wrap your ws-calls into actor. Actor will handle message to start the request, and post-processing e.g.
receive: Receive = {
…
case Get(url) =>
ws.url(url).get().onComplete {
case Success(response) => self ! GetSuccess(response)
case Failure(exception) => self ! GetFailure(exception)
}
case GetSuccess(response) => …..
case GetFailure(exception) => ……
}
you can deploy this actor on specific dispatcher with round robin pool (set amount of workers). This solution does not limit start of requests (so you can get long queue of responses). You can add become with disabling accepting Get, while not received GetSuccess or GetFailure (so worker need to process request completely before start next).

Using libevent together with GCD (libdispatch) in Swift

I'm creating a server side app in Swift 3. I've chosen libevent for implementing networking code because it's cross-platform and doesn't suffer from C10k problem. Libevent implements it's own event loop, but I want to keep CFRunLoop and GCD (DispatchQueue.main.after etc) functional as well, so I need to glue them somehow.
This is what I've came up with:
var terminated = false
DispatchQueue.main.after(when: DispatchTime.now() + 3) {
print("Dispatch works!")
terminated = true
}
while !terminated {
switch event_base_loop(eventBase, EVLOOP_NONBLOCK) { // libevent
case 1:
break // No events were processed
case 0:
print("DEBUG: Libevent processed one or more events")
default: // -1
print("Unhandled error in network backend")
exit(1)
}
RunLoop.current().run(mode: RunLoopMode.defaultRunLoopMode,
before: Date(timeIntervalSinceNow: 0.01))
}
This works, but introduces a latency of 0.01 sec. While RunLoop is sleeping, libevent won't be able to process events. Lowering this timeout increases CPU usage significantly when the app is idle.
I was also considering using only libevent, but third party libs in the project can use dispatch_async internally, so this can be problematic.
Running libevent's loop in a different thread makes synchronization more complex, is this the only way of solving this latency issue?
LINUX UPDATE. The above code does not work on Linux (2016-07-25-a Swift snapshot), RunLoop.current().run exists with an error. Below is a working Linux version reimplemented with a timer and dispatch_main. It suffers from the same latency issue:
let queue = dispatch_get_main_queue()
let timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, queue)
let interval = 0.01
let block: () -> () = {
guard !terminated else {
print("Quitting")
exit(0)
}
switch server.loop() {
case 1: break // Just idling
case 0: break //print("Libevent: processed event(s)")
default: // -1
print("Unhandled error in network backend")
exit(1)
}
}
block()
let fireTime = dispatch_time(DISPATCH_TIME_NOW, Int64(interval * Double(NSEC_PER_SEC)))
dispatch_source_set_timer(timer, fireTime, UInt64(interval * Double(NSEC_PER_SEC)), UInt64(NSEC_PER_SEC) / 10)
dispatch_source_set_event_handler(timer, block)
dispatch_resume(timer)
dispatch_main()
A quick search of the Open Source Swift Foundation libraries on GitHub reveals that the support in CFRunLoop is (perhaps obviously) implemented differently on different platforms. This means, in essence, that RunLoop and libevent, with respect to cross-platform-ness, are just different ways to achieve the same thing. I can see the thinking behind the thought that libevent is probably better suited to server implementations, since CFRunLoop didn't grow up with that specific goal, but as far as being cross-platform goes, they're both barking up the same tree.
That said, the underlying synchronization primitives used by RunLoop and libevent are inherently private implementation details and, perhaps more importantly, different between platforms. From the source, it looks like RunLoop uses epoll on Linux, as does libevent, but on macOS/iOS/etc, RunLoop is going to use Mach ports as its fundamental primitive, but libevent looks like it's going to use kqueue. You might, with enough effort, be able to make a hybrid RunLoopSource that ties to a libevent source for a given platform, but this would likely be very fragile, and generally ill-advised, for a couple of reasons: Firstly, it would be based on private implementation details of RunLoop that are not part of the public API, and therefore subject to change at any time without notice. Second, assuming you didn't go through and do this for every platform supported by both Swift and libevent, you would have broken the cross-platform-ness of it, which was one of your stated reasons for going with libevent in the first place.
One additional option you might not have considered would be to use GCD by itself, without RunLoops. Look at the docs for dispatch_main. In a server application, there's (typically) nothing special about a "main thread," so dispatching to the "main queue", should be good enough (if needed at all). You can use dispatch "sources" to manage your connections, etc. I can't personally speak to how dispatch sources scale up to the C10K/C100K/etc. level, but they've seemed pretty lightweight and low-overhead in my experience. I also suspect that using GCD like this would likely be the most idiomatic way to write a server application in Swift. I've written up a small example of a GCD-based TCP echo server as part of another answer here.
If you were bound and determined to use both RunLoop and libevent in the same application, it would, as you guessed, be best to give libevent it's own separate thread, but I don't think it's as complex as you might think. You should be able to dispatch_async from libevent callbacks freely, and similarly marshal replies from GCD managed threads to libevent using libevent's multi-threading mechanisms fairly easily (i.e. either by running with locking on, or by marshaling your calls into libevent as events themselves.) Similarly, third party libraries using GCD should not be an issue even if you chose to use libevent's loop structure. GCD manages its own thread pools and would have no way of stepping on libevent's main loop, etc.
You might also consider architecting your application such that it didn't matter what concurrency and connection handling library you used. Then you could swap out libevent, GCD, CFStreams, etc. (or mix and match) depending on what worked best for a given situation or deployment. Choosing a concurrency approach is important, but ideally you wouldn't couple yourself to it so tightly that you couldn't switch if circumstances called for it.
When you have such an architecture, I'm generally a fan of the approach of using the highest level abstraction that gets the job done, and only driving down to lower level abstractions when specific circumstances require it. In this case, that would probably mean using CFStreams and RunLoops to start, and switching out to "bare" GCD or libevent later, if you hit a wall and also determined (through empirical measurement) that it was the transport layer and not the application layer that was the limiting factor. Very few non-trivial applications actually get to the C10K problem in the transport layer; things tend to have to scale "out" at the application layer first, at least for apps more complicated than basic message passing.

play - how to wrap a blocking code with futures

I am trying to understand the difference between the 2 methods, in terms of functionality.
class MyService (blockService: BlockService){
def doSomething1(): Future[Boolean] = {
//do
//some non blocking
//stuff
val result = blockService.block()
Future.successful(result)
}
def doSomething2(): Future[Boolean] = {
Future{
//do
//some non blocking
//stuff
blockService.block()
}
}
}
To my understanding the difference between the 2 is which thread is the actual thread that will be blocked.
So if there is a thread: thread_1 that execute something1, thread_1 will be the one that is blocked, while if a thread_1 executed something2a new thread will run it - thread_2, and thread_2 is the one to be blocked.
Is this true?
If so, than there is no really a preferred way to write this code? if I don't care which thread will eventually be blocked, then the end result will be the same.
dosomething1 seems like a weird way to write this code, I would choose dosomething2.
Make sense?
Yes, doSomething1 and doSomething2 blocks different threads, but depending on your scenario, this is an important decision.
As #AndreasNeumann said, you can have different execution contexts in doSomething2. Imagine that the main execution context is the one receiving HTTP requests from your users. Block threads in this context is bad because you can easily exhaust the execution context and impact requests that have nothing to do with doSomething.
Play docs have a better explanation about the possible problems with having blocking code:
If you plan to write blocking IO code, or code that could potentially do a lot of CPU intensive work, you need to know exactly which thread pool is bearing that workload, and you need to tune it accordingly. Doing blocking IO without taking this into account is likely to result in very poor performance from Play framework, for example, you may see only a few requests per second being handled, while CPU usage sits at 5%. In comparison, benchmarks on typical development hardware (eg, a MacBook Pro) have shown Play to be able to handle workloads in the hundreds or even thousands of requests per second without a sweat when tuned correctly.
In your case, both methods are being executed using Play default thread pool. I suggest you to take a look at the recommended best practices and see if you need a different execution context or not. I also suggest you to read Akka docs about Dispatchers and Futures to gain a better understanding about what executing Futures and have blocking/non-blocking code.
This approach makes sense if you make use of different execution contexts in the second method.
So having for example one for answering requests and another for blocking requests.
So you would use the normal playExecutionContext to keep you application running and answering and separate blocking operation in a different one.
def doSomething2(): Future[Boolean] = Future{
blocking { blockService.block() }
}( mySpecialExecutionContextForBlockingOperations )
For a little more information: http://docs.scala-lang.org/overviews/core/futures.html#blocking
You are correct. I don't see a point in doSomething1. It simply complicates the interface for the caller while not providing the benefits of an asynchronous API.
Does BlockService handle blocking operation?
Normally, use blocking ,as #Andreas remind,to make blocking operation into another thread is meanful.

Boost Async UDP Client

I've read through the boost:asio documentation (which appears silent on async clients), and looked through here, but can't seem to find the forest for the trees here.
I've got a simulation that has a main loop that looks like this:
for(;;)
{
a = do_stuff1();
do_stuff2(a);
}
Easy enough.
What I'd like to do, is modify it so that I have:
for(;;)
{
a = do_stuff1();
check_for_new_received_udp_data(&b);
modify_a_with_data_from_b(a,b);
do_stuff2(a);
}
Where I have the following requirements:
I cannot lose data just because I wasn't actively listening. IE I don't want to lose packets because I was in do_stuff2() instead of check_for_new_received_udp_data() at the time the server sent the packet.
I can't have check_for_new_received_udp_data() block for more than about 2ms, since the main for loop needs to execute at 60Hz.
The server will be running elsewhere, and has a completely erratic schedule. Sometimes there will be no data, othertimes I may get the same packet repeatedly.
I've played with the async UDP, but that requires calling io_service.run(), which blocks indefinitely, so that doesn't really help me.
I thought about timing out a blocking socket read, but it seems you have to cheat and get out of the boost calls to do that, so that's a non-starter.
Is the answer going to involve threading? Either way, could someone kindly point me to an example that is somewhat similar? Surely this has been done before.
To avoid blocking in the io_service::run() you can use io_service::poll_one().
Regarding loosing UDP packets, I think you are out of luck. UDP does not guarantee delivery, and any part of the network may decide to drop UDP packets if there is much traffic. If you need to ensure delivery you need to have either implement some sort of flow control or just use TCP.
I think your problem is that you're still thinking synchronously. You need to think asynchronously.
Async read on UDP socket - will call handler when data arrives.
Within that handler do your processing on the incoming data. Keep in mind that while you're processing, if you have a single thread, nothing else dispatches. This can be perfectly OK (UDP messages will still be queued in the network stack...).
As a result of this you could be starting other asynchronous operations.
If you need to do work in parallel that is essentially unrelated or offline that will involve threads. Create a thread that calls io_service.run().
If you need to do periodic work in an asynch framework use timers.
In your particular example we can rearrange things like this (psuedo-code):
read_handler( ... )
{
modify_a_with_data_from_b(a,b);
do_stuff2(a);
a = do_stuff1();
udp->async_read( ..., read_handler );
}
periodic_handler(...)
{
// do periodic stuff
timer.async_wait( ..., periodic_handler );
}
main()
{
...
a = do_stuff1();
udp->async_read( ..., read_handler )
timer.async_wait( ..., periodic_handler );
io_service.run();
}
Now I'm sure there are other requirements that aren't evident from your question but you'll need to figure out an asynchronous answer to them, this is just an idea. Also ask yourself if you really need an asynchronous framework or just use the synchronous socket APIs.

The reason why Task deletion of uCOS should not occur during ISR

I'm modifying some functionalities (mainly scheduling) of uCos-ii.
And I found out that OSTaskDel function does nothing when it is called by ISR.
Though I learned some basic features of OS, I really don't understand why that should be prohibited.
All it does is withrawl from readylist and release of acquired resources like TCB or semaphores...
Is there any reason for them to be banned while handling interrupt?
It is not clear from the documentation why it is prohibited in this case, but OSTaskDel() explicitly calls OS_Sched(), and in an ISR this should only happen when the outer-most nested interrupt handler exists (handled by OSIntExit()).
I don't think the following is advisable, because there may be other reasons why this is prohibited, but you could remove the:
if (OSIntNesting > 0) {
return (OS_TASK_DEL_ISR);
}
then make the OS_Sched() call conditional as follows:
if (OSIntNesting == 0) {
OS_Sched();
}
If this dies horribly, remember I said it was ill-advised!
This operation will extend your interrupt processing time in any case so is probably a bad idea if only for that reason.
It is a bad idea in general (not just from an ISR) to asynchronously delete another task regardless of that tasks state or resource usage. uC/OS-II provides the OSTaskDelReq() function to manage task deletion in a way that allows a task to delete itself on request and therefore be able to correctly release all its resources. Even without that, sending a request via the task's normal IPC mechanisms is usually better (and more portable).
If a task is not designed for self-deletion on demand, then you might simply use OSSuspend().
Generally, you cannot do a few things in ISRs:
block on a semaphore and the like
block while acquiring a spin lock, if it's a single-CPU system
cause a page fault, that has to be resolved by the virtual memory subsystem (with virtual on-disk memory, that is)
If you do any of the above in an ISR, you'll have a deadlock.
OSTaskDel() is probably doing some of those things.