RxJava/RxKotlin: combineLatest that already completes if one source completes (not all) - rx-java2

Basically, I have two Flowables F and G and I want to use combineLatest on them, but I want the combined Flowable to already complete if F completes (even if G is still running).
Here is an example of what I what to achieve with an ugly solution:
fun combineFandGbutTerminateIfFTerminates(F: Flowable<Int>, G: Flowable<Int>) : Flowable<Pair<Int, Int>> {
val _F = F.share()
val _G = G.takeUntil(_F.ignoreElements().toFlowable<Nothing>())
val FandG = Flowables.combineLatest(_F, _G)
return FandG
}
We can extract that into and extension function:
fun<T> Flowable<T>.completeWith(other: Flowable<*>) : Flowable<T> {
return takeUntil(other.ignoreElements().toFlowable<Nothing>())
}
Is there a nicer way to express that?

I came to the following solution. It allows to combine one master with many slave sources. If the master completes, the combined Flowable completes. However, if a slave completes before the master, an error SlaveCompletedPrematurelyError is propagated.
class SlaveCompletedPrematurelyError(message: String) : Throwable(message)
/**
* Combine this Flowable with one slave source.
*/
#CheckReturnValue
#BackpressureSupport(BackpressureKind.FULL)
#SchedulerSupport(SchedulerSupport.NONE)
fun <T, T1, R> Flowable<T>.combineLatestSlaves(
slaveSource: Flowable<T1>,
combineFunction: (T, T1) -> R
): Flowable<R> = combineLatestSlaves(Functions.toFunction(combineFunction), slaveSource)
/**
* Combine this Flowable with two slave sources.
*/
#CheckReturnValue
#BackpressureSupport(BackpressureKind.FULL)
#SchedulerSupport(SchedulerSupport.NONE)
fun <T, T1, T2, R> Flowable<T>.combineLatestSlaves(
slaveSource1: Flowable<T1>,
slaveSource2: Flowable<T2>,
combineFunction: (T, T1, T2) -> R
) =
combineLatestSlaves(Functions.toFunction(combineFunction), slaveSource1, slaveSource2)
/**
* Combine this Flowable with three slave sources.
*/
#CheckReturnValue
#BackpressureSupport(BackpressureKind.FULL)
#SchedulerSupport(SchedulerSupport.NONE)
fun <T, T1, T2, T3, R> Flowable<T>.combineLatestSlaves(
slaveSource1: Flowable<T1>,
slaveSource2: Flowable<T2>,
slaveSource3: Flowable<T3>,
combineFunction: (T, T1, T2, T3) -> R
) =
combineLatestSlaves(Functions.toFunction(combineFunction), slaveSource1, slaveSource2, slaveSource3)
/**
* Combine this Flowable with many slave sources.
*/
#SchedulerSupport(SchedulerSupport.NONE)
#CheckReturnValue
#BackpressureSupport(BackpressureKind.FULL)
fun <T : U, U, R> Flowable<T>.combineLatestSlaves(
combiner: Function<in Array<Any>, out R>,
vararg slaveSources: Publisher<out U>
): Flowable<R> =
combineLatestSlaves(slaveSources, combiner, bufferSize())
/**
* Combine this Flowable with many slave sources.
*
* This function is identical of using combineLatest with this and the slave sources except with the following changes:
* - If this Flowable completes, the resulting Flowable completes even if the slave sources are still running.
* - If a slave source completes before this Flowable, a SlaveCompletedPrematurelyError error is triggered.
*/
#SchedulerSupport(SchedulerSupport.NONE)
#CheckReturnValue
#BackpressureSupport(BackpressureKind.FULL)
fun <T : U, U, R> Flowable<T>.combineLatestSlaves(
slaveSources: Array<out Publisher<out U>>,
combiner: Function<in Array<Any>, out R>,
bufferSize: Int
): Flowable<R> {
val masterCompleted = Throwable()
val sources = Array<Publisher<out U>>(slaveSources.size + 1) {
when (it) {
0 -> Flowable.error<U>(masterCompleted).startWith(this)
else -> Flowable.error<U> { SlaveCompletedPrematurelyError(slaveSources[it - 1].toString()) }.startWith(
slaveSources[it - 1]
)
}
}
return combineLatest(sources, combiner, bufferSize).onErrorComplete { it == masterCompleted }
}
/**
* Errors encountered in the stream for which the provided `predicate` returns true will be silently turned into graceful completion.
*/
#CheckReturnValue
#BackpressureSupport(BackpressureKind.FULL)
#SchedulerSupport(SchedulerSupport.NONE)
inline fun <T> Flowable<T>.onErrorComplete(crossinline predicate: (Throwable) -> Boolean): Flowable<T> =
onErrorResumeNext { error: Throwable ->
if (predicate(error)) Flowable.empty<T>() else Flowable.error<T>(
error
)
}

Related

Proper set of operators to log execution time with disposals on timeouts

I need to log the time every time the upstream is subscribed to. That's fine, cause you can do
fun <T> Single<T>.markTime(name: () -> String): Single<T> = this
.doOnEvent { _, _ -> markElapsedTime(name() + "-on-event") }
However, this won't work for disposals. So we add
fun <T> Single<T>.markTime(name: () -> String): Single<T> = this
.doOnEvent { _, _ -> markElapsedTime(name() + "-on-event") }
.doOnDispose { markElapsedTime(name() + "-dispose") }
But this logs some events twice. How to avoid this? There seems to be no built-in rx operator which supports this. Not that doFinally won't work since it does the job only after downstream. We need to perform the action before calling downstream and be able to log disposals.
Of course you may do
fun <T> Single<T>.markTime(name: () -> String): Single<T> = compose {
var alreadyLogged = false
it.doOnEvent { _, _ ->
markElapsedTime(name())
alreadyLogged = true
}.doOnDispose { if (!alreadyLogged) markElapsedTime(name()) }
}
But this is a hack.

Issue with defining custom partition stage (Cannot pull in port twice)

So I have this little Custom Stage for partitioning in Akka Streams.
object CustomPartitioner {
/**
* Creates a Partition stage that, given a type A, makes a decision to whether to partition to subtype B or subtype C
*
* #param partitionF applies function, if true, route to B, otherwise route to C.
*
* #tparam A type of input
* #tparam B type of output on the first outlet.
* #tparam C type of output on the second outlet.
*
* #return A partition stage
*/
def apply[A, B, C](partitionF: A => Either[B, C]) =
new GraphStage[FanOutShape2[A, B, C]] {
private val in: Inlet[A] = Inlet[A]("in")
private val outB = Outlet[B]("outB")
private val outC = Outlet[C]("outC")
private val pendingB = MutableQueue.empty[B]
private val pendingC = MutableQueue.empty[C]
override def shape: FanOutShape2[A, B, C] = new FanOutShape2(in, outB, outC)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) with InHandler with OutHandler {
setHandler(in, this)
setHandler(outB, this)
setHandler(outC, this)
override def onPush(): Unit = {
val elem = grab(in)
partitionF(elem) match {
case Left(b) =>
pendingB.enqueue(b)
tryPush(outB, pendingB, b)
case Right(c) =>
pendingC.enqueue(c)
tryPush(outC, pendingC, c)
}
}
override def onPull(): Unit = pull(in)
private def tryPush[T](out: Outlet[T], pending: MutableQueue[T]): Unit =
if (isAvailable(out) && pending.nonEmpty) push(out, pending.dequeue())
}
}
I have hooked this as a partitioner into a flow and then merged it back into a sink.
When I try to push a message through the stream using a component test
java.lang.IllegalArgumentException: Cannot pull port (in(256390569)) twice
and then the test fails with
java.lang.AssertionError: assertion failed: expected: expecting request() signal but got unexpected message CancelSubscription(PublisherProbeSubscription(akka.stream.impl.fusing.ActorGraphInterpreter$BatchingActorInputBoundary$$anon$1#53c99b09,akka.testkit.TestProbe#2539cd1c))
I am pretty certain I am messing up the setHandler calls, since there are two of them to handle both outB and outC. However I do not know how to fix it, to make this entire system only call onPush and onPull exactly once.
I managed to get it to work by
override def onPull(): Unit =
if (!hasBeenPulled(in))
pull(in)

Scala Partial Function Application Semantics + locking with synchronized

Based on my previous question on locks based on value-equality rather than lock-equality, I came up with the following implementation:
/**
* An util that provides synchronization using value equality rather than referential equality
* It is guaranteed that if two objects are value-equal, their corresponding blocks are invoked mutually exclusively.
* But the converse may not be true i.e. if two objects are not value-equal, they may be invoked exclusively too
* Note: Typically, no need to create instances of this class. The default instance in the companion object can be safely reused
*
* #param size There is a 1/size probability that two invocations that could be invoked concurrently is not invoked concurrently
*
* Example usage:
* import EquivalenceLock.{defaultInstance => lock}
* def run(person: Person) = lock(person) { .... }
*/
class EquivalenceLock(val size: Int) {
private[this] val locks = IndexedSeq.fill(size)(new Object())
def apply[U](lock: Any)(f: => U) = locks(lock.hashCode().abs % size).synchronized(f)
}
object EquivalenceLock {
implicit val defaultInstance = new EquivalenceLock(1 << 10)
}
I wrote some tests to verify that my lock functions as expected:
import EquivalenceLock.{defaultInstance => lock}
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.collection.mutable
val journal = mutable.ArrayBuffer.empty[String]
def log(msg: String) = journal.synchronized {
println(msg)
journal += msg
}
def test(id: String, napTime: Int) = Future {
lock(id) {
log(s"Entering $id=$napTime")
Thread.sleep(napTime * 1000L)
log(s"Exiting $id=$napTime")
}
}
test("foo", 5)
test("foo", 2)
Thread.sleep(20 * 1000L)
val validAnswers = Set(
Seq("Entering foo=5", "Exiting foo=5", "Entering foo=2", "Exiting foo=2"),
Seq("Entering foo=2", "Exiting foo=2", "Entering foo=5", "Exiting foo=5")
)
println(s"Final state = $journal")
assert(validAnswers(journal))
The above tests works as expected (tested over millions of runs). But, when I change the following line:
def apply[U](lock: Any)(f: => U) = locks(lock.hashCode().abs % size).synchronized(f)
to this:
def apply[U](lock: Any) = locks(lock.hashCode().abs % size).synchronized _
the tests fail.
Expected:
Entering foo=5
Exiting foo=5
Entering foo=2
Exiting foo=2
OR
Entering foo=2
Exiting foo=2
Entering foo=5
Exiting foo=5
Actual:
Entering foo=5
Entering foo=2
Exiting foo=2
Exiting foo=5
The above two pieces of code should be the same and yet the tests (i.e. the lock(id) always enters concurrently for the same id) for the second flavor (the one with partial application) of code. Why?
By default function parameters are evaluated eagerly. So
def apply[U](lock: Any) = locks(lock.hashCode().abs % size).synchronized _
is equivalent to
def apply[U](lock: Any)(f: U) = locks(lock.hashCode().abs % size).synchronized(f)
in this case f is evaluated before the synchronized block.

Parallel processing pattern in Scala

I hope this is not a stupid question or I'm missing something obvious. I'm following the Coursera parallel programming class, and in week 1 they have the following code to run tasks in parallel (may differ slightly, since I typed mine in):
object parallelism {
val forkJoinPool = new ForkJoinPool
abstract class TaskScheduler {
def schedule[T](body: => T): ForkJoinTask[T]
def parallel[A, B](taskA: => A, taskB: => B): (A, B) = {
val right = task {
taskB
}
val left = taskA
(left, right.join())
}
}
class DefaultTaskScheduler extends TaskScheduler {
def schedule[T](body: => T): ForkJoinTask[T] = {
val t = new RecursiveTask[T] {
def compute = body
}
Thread.currentThread match {
case wt: ForkJoinWorkerThread => t.fork()
case _ => forkJoinPool.execute(t)
}
t
}
}
val scheduler =
new DynamicVariable[TaskScheduler](new DefaultTaskScheduler)
def task[T](body: => T): ForkJoinTask[T] = {
scheduler.value.schedule(body)
}
def parallel[A, B](taskA: => A, taskB: => B): (A, B) = {
scheduler.value.parallel(taskA, taskB)
}
}
I wrote a unit test that goes soemthing like this:
test("Test two task parallelizer") {
val (r1, t1) = timed {
( sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000))
}
val (r2, t2) = timed {
parallel (
sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000)
)
}
assert(t2 < t1)
}
test("Test four task parallelizer") {
val (r1, t1) = timed {
(sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000))
}
val (r2, t2) = timed {
parallel (
parallel (
sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000)
),
parallel (
sieveOfEratosthenes(100000),
sieveOfEratosthenes(100000)
)
)
}
assert(t2 < t1)
}
On the first test, I get good savings (300ms down to 50ms) savings, but on the second test, I only get about 20ms savings, and if I run it often enough the time may actually increase and fail my test. (the second value in the tuple returned by "timed" is the time in milliseconds)
The test method is first version from here: https://rosettacode.org/wiki/Sieve_of_Eratosthenes#Scala
Can someone teach me what is going on in the second test? If it matters, I'm running on a single cpu, quad core i5. The number of threads I create doesn't seem to make a lot of difference for this particular test.
The implementation of sieveOfEratosthenes you chose is already parallel (it's using ParSet), so parallelizing it further won't help.
The speedup you see in the first test is probably JIT warmup.

Is there a difference between partial application and returning a function?

In terms of under the hood: stack/heap allocation, garbage collection, resources and performance, what is the difference between the following three:
def Do1(a:String) = { (b:String) => { println(a,b) }}
def Do2(a:String)(b:String) = { println(a,b) }
def Do3(a:String, b:String) = { println(a,b) }
Do1("a")("b")
Do2("a")("b")
(Do3("a", _:String))("b")
Except the obvious surface differences in declaration about how much arguments each takes and returns
Decompiling the following class (note the additional call to Do2 compared to your question):
class Test {
def Do1(a: String) = { (b: String) => { println(a, b) } }
def Do2(a: String)(b: String) = { println(a, b) }
def Do3(a: String, b: String) = { println(a, b) }
Do1("a")("b")
Do2("a")("b")
(Do2("a") _)("b")
(Do3("a", _: String))("b")
}
yields this pure Java code:
public class Test {
public Function1<String, BoxedUnit> Do1(final String a) {
new AbstractFunction1() {
public final void apply(String b) {
Predef..MODULE$.println(new Tuple2(a, b));
}
};
}
public void Do2(String a, String b) {
Predef..MODULE$.println(new Tuple2(a, b));
}
public void Do3(String a, String b) {
Predef..MODULE$.println(new Tuple2(a, b));
}
public Test() {
Do1("a").apply("b");
Do2("a", "b");
new AbstractFunction1() {
public final void apply(String b) {
Test.this.Do2("a", b);
}
}.apply("b");
new AbstractFunction1() {
public final void apply(String x$1) {
Test.this.Do3("a", x$1);
}
}.apply("b");
}
}
(this code doesn't compile, but it suffices for analysis)
Let's look at it part by part (Scala & Java in each listing):
def Do1(a: String) = { (b: String) => { println(a, b) } }
public Function1<String, BoxedUnit> Do1(final String a) {
new AbstractFunction1() {
public final void apply(String b) {
Predef.MODULE$.println(new Tuple2(a, b));
}
};
}
No matter how Do1 is called, a new Function object is created.
def Do2(a: String)(b: String) = { println(a, b) }
public void Do2(String a, String b) {
Predef.MODULE$.println(new Tuple2(a, b));
}
def Do3(a: String, b: String) = { println(a, b) }
public void Do3(String a, String b) {
Predef.MODULE$.println(new Tuple2(a, b));
}
Do2 and Do3 compile down to the same bytecode. The difference is exclusively in the #ScalaSignature annotation.
Do1("a")("b")
Do1("a").apply("b");
Do1 is straight-forward: the returned function is immediately applied.
Do2("a")("b")
Do2("a", "b");
With Do2, the compiler sees that this is not a partial application, and compiles it to a single method invocation.
(Do2("a") _)("b")
new AbstractFunction1() {
public final void apply(String b) {
Test.this.Do2("a", b);
}
}.apply("b");
(Do3("a", _: String))("b")
new AbstractFunction1() {
public final void apply(String x$1) {
Test.this.Do3("a", x$1);
}
}.apply("b");
Here, Do2 and Do3 are first partially applied, then the returned functions are immediately applied.
Conclusion:
I would say that Do2 and Do3 are mostly equivalent in the generated bytecode. A full application results in a simple, cheap method call. Partial application generates anonymous Function classes at the caller. What variant you use depends mostly on what intent you're trying to communicate.
Do1 always creates an immediate function object, but does so in the called code. If you expect to do partial applications of the function a lot, the using this variant will reduce your code size, and maybe trigger the JIT-Compiler earlier, because the same code is called more often. Full application will be slower, at least before the JIT-Compiler inlines and subsequently eliminates object creations at individual call sites. I'm not an expert on this, so I don't know whether you can expect that kind of optimization. My best guess would be that you can, for pure functions.