System.Reactive.Concurrency.DefaultScheduler - system.reactive

In my application I've written all of my Rx code to use Scheduler.Default.
I wanted to know if there's a difference between specifying Scheduler.Default and not specifying a scheduler at all?
What is the strategy employed by System.Reactive.Concurrency.DefaultScheduler?

Rx uses an appropriate strategy dependent on the platform specific PlatformServices that are loaded - hence you can have a different approach in different cases. The OOB implementation looks at whether Threads are available on your platform, and if so uses Threads and the platform Timer implementation to schedule items, otherwise it uses Tasks. The later case arises in Windows 8 Apps, for example.
You can find a good video about how platform services are implemented from the creator here: http://channel9.msdn.com/Shows/Going+Deep/Bart-De-Smet-Rx-20-RTM-and-RTW
Look here for information about how the built-in operators behave when you do and don't specify a scheduler: http://msdn.microsoft.com/en-us/library/hh242963(v=vs.103).aspx

Yes there is a difference between specifying Scheduler.Default and not specifying a scheduler. Using Scheduler.Default will introduce asynchronous and possibly concurrent behavior, while not supplying a scheduler leaves it up to the discretion of the operator. Some operators will choose to execute synchronously while others will execute asynchronously, while others will choose to jump threads.
It is probably a bad idea (for performance and possibly even correctness since too much concurrency might lead you into a deadlock situation) to supply Scheduler.Default to every Rx operator. If you do not have a specific scheduling requirement, then do not supply a scheduler and let the operator pick what it needs.
For example,
this will complete synchronously:
int result = 0;
Observable.Return(42).Subscribe(v => result = v);
result == 42;
while this will complete asynchronously (and likely on another thread):
int result = 0;
Observable.Return(42, Scheduler.Default).Subscribe(v => result = v);
result == 0;
// some time later
result == 42;

Related

Background task in reactive pipeline (Fire-and-forget)

I have a reactive pipeline to process incoming requests. For each request I need to call a business-relevant function (doSomeRelevantProcessing).
After that is done, I need to notify some external service about what happened. That part of the pipeline should not increase the overall response time.
Also, notifying this external system is not business critical: giving a quick response after the main part of the pipeline is finished is more important than making sure the notification is successful.
As far as I learned, the only way to run something in the background without slowing down the overall process is to subscribe to in directly in the pipeline, thus achieving a fire-and-forget mentality.
Is there a good alternative to subscribing inside the flatmap?
I am a little worried about what might happen if notifying the external service takes longer than the original processing and a lot of requests are coming in at once. Could this lead to a memory exhaustion or the overall process to block?
fun runPipeline(incoming: Mono<Request>) = incoming
.flatMap { doSomeRelevantProcessing(it) } // this should not be delayed
.flatMap { doBackgroundJob(it) } // this can take a moment, but is not super critical
fun doSomeRelevantProcessing(request: Request) = Mono.just(request) // do some processing
fun doBackgroundJob(request: Request) = Mono.deferContextual { ctx: ContextView ->
val notification = "notification" // build an object from context
// this uses non-blocking HTTP (i.e. webclient), so it can take a second or so
notifyExternalService(notification).subscribeOn(Schedulers.boundedElastic()).subscribe()
Mono.just(Unit)
}
fun notifyExternalService(notification: String) = Mono.just(Unit) // might take a while
I'm answering this assuming that you notify the external service using purely reactive mechanisms - i.e. you're not wrapping a blocking service. If you are then the answer would be different as you're bound by the size of your bounded elastic thread pool, which could quickly become overwhelmed if you have hundreds of requests a second incoming.
(Assuming you're using reactive mechanisms, then there's no need for .subscribeOn(Schedulers.boundedElastic()) as you give in your example, as that's not buying you anything - it's designed for wrapping legacy blocking services.)
Could this lead to a memory exhaustion
It's only a possibility in really extreme cases, the memory used by each individual request will be tiny. It's almost certainly not worth worrying about, if you start seeing memory issues here then you'll almost certainly be hit by other issues elsewhere.
That being said, I'd probably recommend adding .timeout(Duration.ofSeconds(5)) or similar before your inner subscribe method to make sure the requests are killed off after a while if they haven't worked for any reason - this will prevent them building up.
...or [can this cause] the overall process to block?
This one is easier - a short no, it can't.

Using libevent together with GCD (libdispatch) in Swift

I'm creating a server side app in Swift 3. I've chosen libevent for implementing networking code because it's cross-platform and doesn't suffer from C10k problem. Libevent implements it's own event loop, but I want to keep CFRunLoop and GCD (DispatchQueue.main.after etc) functional as well, so I need to glue them somehow.
This is what I've came up with:
var terminated = false
DispatchQueue.main.after(when: DispatchTime.now() + 3) {
print("Dispatch works!")
terminated = true
}
while !terminated {
switch event_base_loop(eventBase, EVLOOP_NONBLOCK) { // libevent
case 1:
break // No events were processed
case 0:
print("DEBUG: Libevent processed one or more events")
default: // -1
print("Unhandled error in network backend")
exit(1)
}
RunLoop.current().run(mode: RunLoopMode.defaultRunLoopMode,
before: Date(timeIntervalSinceNow: 0.01))
}
This works, but introduces a latency of 0.01 sec. While RunLoop is sleeping, libevent won't be able to process events. Lowering this timeout increases CPU usage significantly when the app is idle.
I was also considering using only libevent, but third party libs in the project can use dispatch_async internally, so this can be problematic.
Running libevent's loop in a different thread makes synchronization more complex, is this the only way of solving this latency issue?
LINUX UPDATE. The above code does not work on Linux (2016-07-25-a Swift snapshot), RunLoop.current().run exists with an error. Below is a working Linux version reimplemented with a timer and dispatch_main. It suffers from the same latency issue:
let queue = dispatch_get_main_queue()
let timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, queue)
let interval = 0.01
let block: () -> () = {
guard !terminated else {
print("Quitting")
exit(0)
}
switch server.loop() {
case 1: break // Just idling
case 0: break //print("Libevent: processed event(s)")
default: // -1
print("Unhandled error in network backend")
exit(1)
}
}
block()
let fireTime = dispatch_time(DISPATCH_TIME_NOW, Int64(interval * Double(NSEC_PER_SEC)))
dispatch_source_set_timer(timer, fireTime, UInt64(interval * Double(NSEC_PER_SEC)), UInt64(NSEC_PER_SEC) / 10)
dispatch_source_set_event_handler(timer, block)
dispatch_resume(timer)
dispatch_main()
A quick search of the Open Source Swift Foundation libraries on GitHub reveals that the support in CFRunLoop is (perhaps obviously) implemented differently on different platforms. This means, in essence, that RunLoop and libevent, with respect to cross-platform-ness, are just different ways to achieve the same thing. I can see the thinking behind the thought that libevent is probably better suited to server implementations, since CFRunLoop didn't grow up with that specific goal, but as far as being cross-platform goes, they're both barking up the same tree.
That said, the underlying synchronization primitives used by RunLoop and libevent are inherently private implementation details and, perhaps more importantly, different between platforms. From the source, it looks like RunLoop uses epoll on Linux, as does libevent, but on macOS/iOS/etc, RunLoop is going to use Mach ports as its fundamental primitive, but libevent looks like it's going to use kqueue. You might, with enough effort, be able to make a hybrid RunLoopSource that ties to a libevent source for a given platform, but this would likely be very fragile, and generally ill-advised, for a couple of reasons: Firstly, it would be based on private implementation details of RunLoop that are not part of the public API, and therefore subject to change at any time without notice. Second, assuming you didn't go through and do this for every platform supported by both Swift and libevent, you would have broken the cross-platform-ness of it, which was one of your stated reasons for going with libevent in the first place.
One additional option you might not have considered would be to use GCD by itself, without RunLoops. Look at the docs for dispatch_main. In a server application, there's (typically) nothing special about a "main thread," so dispatching to the "main queue", should be good enough (if needed at all). You can use dispatch "sources" to manage your connections, etc. I can't personally speak to how dispatch sources scale up to the C10K/C100K/etc. level, but they've seemed pretty lightweight and low-overhead in my experience. I also suspect that using GCD like this would likely be the most idiomatic way to write a server application in Swift. I've written up a small example of a GCD-based TCP echo server as part of another answer here.
If you were bound and determined to use both RunLoop and libevent in the same application, it would, as you guessed, be best to give libevent it's own separate thread, but I don't think it's as complex as you might think. You should be able to dispatch_async from libevent callbacks freely, and similarly marshal replies from GCD managed threads to libevent using libevent's multi-threading mechanisms fairly easily (i.e. either by running with locking on, or by marshaling your calls into libevent as events themselves.) Similarly, third party libraries using GCD should not be an issue even if you chose to use libevent's loop structure. GCD manages its own thread pools and would have no way of stepping on libevent's main loop, etc.
You might also consider architecting your application such that it didn't matter what concurrency and connection handling library you used. Then you could swap out libevent, GCD, CFStreams, etc. (or mix and match) depending on what worked best for a given situation or deployment. Choosing a concurrency approach is important, but ideally you wouldn't couple yourself to it so tightly that you couldn't switch if circumstances called for it.
When you have such an architecture, I'm generally a fan of the approach of using the highest level abstraction that gets the job done, and only driving down to lower level abstractions when specific circumstances require it. In this case, that would probably mean using CFStreams and RunLoops to start, and switching out to "bare" GCD or libevent later, if you hit a wall and also determined (through empirical measurement) that it was the transport layer and not the application layer that was the limiting factor. Very few non-trivial applications actually get to the C10K problem in the transport layer; things tend to have to scale "out" at the application layer first, at least for apps more complicated than basic message passing.

play - how to wrap a blocking code with futures

I am trying to understand the difference between the 2 methods, in terms of functionality.
class MyService (blockService: BlockService){
def doSomething1(): Future[Boolean] = {
//do
//some non blocking
//stuff
val result = blockService.block()
Future.successful(result)
}
def doSomething2(): Future[Boolean] = {
Future{
//do
//some non blocking
//stuff
blockService.block()
}
}
}
To my understanding the difference between the 2 is which thread is the actual thread that will be blocked.
So if there is a thread: thread_1 that execute something1, thread_1 will be the one that is blocked, while if a thread_1 executed something2a new thread will run it - thread_2, and thread_2 is the one to be blocked.
Is this true?
If so, than there is no really a preferred way to write this code? if I don't care which thread will eventually be blocked, then the end result will be the same.
dosomething1 seems like a weird way to write this code, I would choose dosomething2.
Make sense?
Yes, doSomething1 and doSomething2 blocks different threads, but depending on your scenario, this is an important decision.
As #AndreasNeumann said, you can have different execution contexts in doSomething2. Imagine that the main execution context is the one receiving HTTP requests from your users. Block threads in this context is bad because you can easily exhaust the execution context and impact requests that have nothing to do with doSomething.
Play docs have a better explanation about the possible problems with having blocking code:
If you plan to write blocking IO code, or code that could potentially do a lot of CPU intensive work, you need to know exactly which thread pool is bearing that workload, and you need to tune it accordingly. Doing blocking IO without taking this into account is likely to result in very poor performance from Play framework, for example, you may see only a few requests per second being handled, while CPU usage sits at 5%. In comparison, benchmarks on typical development hardware (eg, a MacBook Pro) have shown Play to be able to handle workloads in the hundreds or even thousands of requests per second without a sweat when tuned correctly.
In your case, both methods are being executed using Play default thread pool. I suggest you to take a look at the recommended best practices and see if you need a different execution context or not. I also suggest you to read Akka docs about Dispatchers and Futures to gain a better understanding about what executing Futures and have blocking/non-blocking code.
This approach makes sense if you make use of different execution contexts in the second method.
So having for example one for answering requests and another for blocking requests.
So you would use the normal playExecutionContext to keep you application running and answering and separate blocking operation in a different one.
def doSomething2(): Future[Boolean] = Future{
blocking { blockService.block() }
}( mySpecialExecutionContextForBlockingOperations )
For a little more information: http://docs.scala-lang.org/overviews/core/futures.html#blocking
You are correct. I don't see a point in doSomething1. It simply complicates the interface for the caller while not providing the benefits of an asynchronous API.
Does BlockService handle blocking operation?
Normally, use blocking ,as #Andreas remind,to make blocking operation into another thread is meanful.

Clarification about Scala Future that never complete and its effect on other callbacks

While re-reading scala.lan.org's page detailing Future here, I have stumbled up on the following sentence:
In the event that some of the callbacks never complete (e.g. the callback contains an infinite loop), the other callbacks may not be executed at all. In these cases, a potentially blocking callback must use the blocking construct (see below).
Why may the other callbacks not be executed at all? I may install a number of callbacks for a given Future. The thread that completes the Future, may or may not execute the callbacks. But, because one callback is not playing footsie, the rest should not be penalized, I think.
One possibility I can think of is the way ExecutionContext is configured. If it is configured with one thread, then this may happen, but that is a specific behaviour and a not generally expected behaviour.
Am I missing something obvious here?
Callbacks are called within an ExecutionContext that has an eventually limited number of threads - if not by the specific context implementation, then by the underlying operating system and/or hardware itself.
Let's say your system's limit is OS_LIMIT threads. You create OS_LIMIT + 1 callbacks. From those, OS_LIMIT callbacks immediately get a thread each - and none ever terminate.
How can you guarantee that the remaining 1 callback ever gets a thread?
Sure, there could be some detection mechanisms built into the Scala library, but it's not possible in the general case to make an optimal implementation: maybe you want the callback to run for a month.
Instead (and this seems to be the approach in the Scala library), you could provide facilities for handling situations that you, the developer, know are risky. This removes the element of surprise from the system.
Perhaps most importantly - it enables the developer to "bake in" the necessary information about handler/task characteristics directly into his/her program, rather than relying on some obscure piece of language functionality (which may change from version to version).

Is there a way to copy files in a non-blocking way in Scala?

I have checked java.nio.file.Files.copy but that blocks a thread until the copy is done. Are there any libraries that allow one to copy a file in a non-blocking way? I need to perform many of these operations simultaneously and cannot afford to have so many threads blocked.
While I could write something myself using non-blocking streams, I would rather use something tried and tested that would guarantee a correct copy every time (or detect if something went wrong).
Check this: Iterate over lines in a file in parallel (Scala)?
val chunkSize = 128 * 1024
val iterator = Source.fromFile(path).getLines.grouped(chunkSize)
iterator.foreach { lines =>
lines.par.foreach { line => process(line) }
}
Reading (copying) files by chunks in parallel. In this case "par" is used.
So it quite non-blocking in terms / scope of processors (cores).
But you may follow same idea of chunks, for example using Akka/Future/Promises to be even in wider scopes.
You may customize you chunk-size deepening on your performance characteristic, level of system load, etc..
One more link that explains possible way to do read / write data from (property) file in parallel using Akka Actors. This is not quite that you might be want, but it may give an idea.
Idea - you may build your own not-blocking way of reading / copying files.
--
And about your statement "While I could write something myself using non-blocking streams":
I would remind that each OS / File System (FS) may have its own vision about what and where to block. Like Windows blocks a file (write-block at leat) if one thread writes to it. On Linux is is configurable. So if you want to stick to something stable, I would suggest to think it out and go with your own wrapper (over FS) solution based on events, chunks, states.
I have used the Process class, issuing an operating system command to copy the file. Of course, one has to check under which OS the application is running, and issue the appropriate command, but this allows for fast and asynchronous copies.
As Marius rightly mentions in the comments, Scala Process blocks, so I run it wrapped in a Future.
Java 8 Process introduces a function isAlive(). A non-blocking alternative would be to use Java 8 processes and use the scheduler to poll at regular intervals to see if the process has finished. However, I did no need to go to this extent.
Have you checked out the async stuff in scala-io?
http://jesseeichar.github.io/scala-io-doc/0.4.2/index.html#!/core/async%20read%20write