Akka Future - Parallel versus Concurrent? - scala

From the well-written Akka Concurrency:
As I understand, the diagram points out, both numSummer and charConcat will run on the same thread.
Is it possible to run each Future in parallel, i.e. on separate threads?

The picture on the left is them running in parallel.
The point of the illustration is that the Future.apply method is what kicks off the execution, so if it doesn't happen until the first future's result is flatMaped (as in the picture on the right), then you don't get the parallel execution.
(Note that by "kicked off", i mean the relevant ExecutionContext is told about the job. How it parallelizes is a different question and may depend on things like the size of its thread pool.)
Equivalent code for the left:
val numSummer = Future { ... } // execution kicked off
val charConcat = Future { ... } // execution kicked off
numSummer.flatMap { numsum =>
charConcat.map { string =>
(numsum, string)
}
}
and for the right:
Future { ... } // execution kicked off
.flatMap { numsum =>
Future { ... } // execution kicked off (Note that this does not happen until
// the first future's result (`numsum`) is available.)
.map { string =>
(numsum, string)
}
}

Related

Set default Execution Context in ZIO

I am trying to use a TrampolineExecutionContext in ZIO in order to test background stream subscriptions on the same thread (so I can run effect in the order I would expect).
testM("Using trampoline execution context") {
(for {
queue <- Queue.unbounded[String]
_ <- ZStream
.fromQueue(queue)
.take(1)
.foreach(el => ZIO.effect(println(s"In Stream $el")))
.fork
_ <- queue.offer("Element")
_ <- ZIO.effect(println("Inside for comprehension")).on(trampolineExecutionContext)
} yield {
assert(1)(equalTo(1))
}).on(trampolineExecutionContext)
}
In this situation, I obtain what I would expect that is:
"In Stream Element", "Inside for comprehension"
If I remove the on(trampolineExecutionContext), I would obtain "Inside for comprehension" only because I am not joining the fiber (creating a sync point).
How can I set for the entire test the default context to be trampolineExecutionContext without repeating it every time in every call or in the important calls?
Maybe it's not exactly what you need, but you can try to override runner method of DefaultRunnableSpec and replace main context with TrampolineExecutionContext:
override def runner = {
super.runner.withPlatform( _.withExecutor(
Executor.fromExecutionContext(1)(
trampolineExecutionContext
)
))
}
In this case you will only need one on(trampolineExecutionContext) at the end of the test instead of two.

Using `onBackpressureLatest` to drop intermediate messages in blocking Flowable

I have a chain where I do some blocking IO calls (e.g. HTTP-call). I want the blocking call to consume a value, proceed without interrupting, but drop everything that is piling up meanwhile, and then consume the next value in the same manner.
Consider the following example:
fun main() {
Flowable.interval(100, TimeUnit.MILLISECONDS).onBackpressureLatest().map {
Thread.sleep(1000)
it
}.blockingForEach { println(it) }
}
From a naive point of view, I would it expect to print something like 0, 10, 20, ..., but it prints 0, 1, 2, ....
What am I doing wrong?
EDIT:
I thought about naively adding debounce to eat up the incoming stream:
fun main() {
Flowable.interval(100, TimeUnit.MILLISECONDS)
.debounce(0, TimeUnit.MILLISECONDS)
.map {
Thread.sleep(1000)
it
}
.blockingForEach { println(it) }
}
But, now I get a java.lang.InterruptedException: sleep interrupted.
EDIT:
What seems to work is the following:
fun main() {
Flowable.interval(100, TimeUnit.MILLISECONDS)
.throttleLast(0, TimeUnit.MILLISECONDS)
.map {
Thread.sleep(1000)
it
}
.blockingForEach { println(it) }
}
The output is as expected 0, 10, 20, ...!!
Is that the correct way?
I noted that throttleLast will switch to the Computation-Scheduler. Is there a way to go back to the original scheduler?
EDIT:
I also get an occasional java.lang.InterruptedException: sleep interrupted with that variant.
The most simple approach to solve the problem is:
fun <T> Flowable<T>.lossy() : Flowable<T> {
return onBackpressureLatest().observeOn(Schedulers.io(), false, 1)
}
By calling lossy on a Flowable it starts to drop all element that are coming in faster than the downstream consumer can process.

Can Spark ForEachPartitionAsync be async on worker nodes?

I write a custom spark sink. In my addBatch method I use ForEachPartitionAsync which if I'm not wrong only makes the driver work asynchronously, returning a future.
val work: FutureAction[Unit] = rdd.foreachPartitionAsync { rows =>
val sourceInfo: StreamSourceInfo = serializeRowsAsInputStream(schema, rows)
val ackIngestion = Future {
ingestRows(sourceInfo) } andThen {
case Success(ingestion) => ackIngestionDone(partitionId, ingestion)
}
Await.result(ackIngestion, timeOut) // I would like to remove this line..
}
work onSuccess {
case _ => // move data from temporary table, report success of all workers
}
work onFailure{
//delete tmp data
case t => throw t.getCause
}
I can't find a way to run the worker nodes without blocking on the Await call, as if I remove them a success is reported to the work future object although the future didn't really finish.
Is there a way to report to the driver that all the workers finished
their asynchronous jobs?
Note: I looked at the foreachPartitionAsync function and it has only one implementation that expects a function that returns a Unit (i would've expected it to have another one returning a future or maybe a CountDownLatch..)

Difference between Promise.failure and throwing exception?

Is there any difference between these two ways of completing a failed Future? If so, which way is considered to be more "correct"?
Calling Promise.failure:
def functionThatFinishesLater: Future[String] = {
val myPromise = Promise[String]
Future {
// Do something that might fail
if (failed) {
myPromise.failure(new RuntimeException("message")) // complete with throwable
} else {
myPromise.success("yay!")
}
} (aDifferentExecutionContext)
myPromise.future
}
Or just throwing an exception
def functionThatFinishesLater: Future[String] = {
val myPromise = Promise[String]
Future {
// Do something that might fail
if (failed) {
throw new RuntimeException("message") // throw the exception
} else {
myPromise.success("yay!")
}
} (aDifferentExecutionContext)
myPromise.future
}
It looks to me like you're mixing paradigms. A Promise is an imperative way of completing a Future, but a Future can also be made completed by wrapping the computation in a Future constructor. You're doing both, which is probably not what you want. The second statement in both code fragments is of type Future[Promise[String]], and I'm almost certain you really want just Future[String].
If you're using using the Future.apply constructor, you should just treat the value produced as the Future, rather than using it to resolve a separate Promise value:
val myFuture = Future {
// Do some long operation that might fail
if (failed) {
throw new RuntimeException("message")
} else {
"yay!"
}
}
The way to use the Promise is to create the Promise, give its Future to some other piece of code that cares, and then use .success(...) or .failure(...) to complete it after some long running operation. So to recap, the big difference is that Future has to wrap the whole computation, but you can pass a Promise around and complete it elsewhere if you need to.

How to Merge or skip duplicate messages in a Scala Actor?

Let's say you have a gui component and 10 threads all tell it to repaint at sufficiently the same time as they all arrive before a single paint operation takes place. Instead of naively wasting resources repainting 10 times, just merge/ignore all but the last one and repaint once (or more likely, twice--once for the first, and once for the last). My understanding is that the Swing repaint manager does this.
Is there a way to accomplish this same type of behavior in a Scala Actor? Is there a way to look at the queue and merge messages, or ignore all but the last of a certain type or something?
Something like this?:
act =
loop {
react {
case Repaint(a, b) => if (lastRepaint + minInterval < System.currentTimeMillis) {
lastRepaint = System.currentTimeMillis
repaint(a, b)
}
}
If you want to repaint whenever the actor's thread gets a chance, but no more, then:
(UPDATE: repainting using the last message arguments)
act =
loop {
react {
case r#Repaint(_, _) =>
var lastMsg = r
def findLast: Unit = {
reactWithin(0) {
case r#Repaint(_, _) =>
lastMsg = r
case TIMEOUT => repaint(lastMsg.a, lastMsg.b)
}
}
findLast
}
}