Split Rx Observable into multiple streams and process individually - reactive-programming

Here is a picture of what I am attempting to accomplish.
--a-b-c-a--bbb--a
split into
--a-----a-------a --> a stream
----b------bbb--- --> b stream
------c---------- --> c stream
Then, be able to
a.subscribe()
b.subscribe()
c.subscribe()
So far, everything I have found has split the stream using a groupBy(), but then collapsed everything back into a single stream and process them all in the same function. What I want to do is process each derived stream in a different way.
The way I'm doing it right now is doing a bunch of filters. Is there a better way to do this?

Easy as pie, just use filter
An example in scala
import rx.lang.scala.Observable
val o: Observable[String] = Observable.just("a", "b", "c", "a", "b", "b", "b", "a")
val hotO: Observable[String] = o.share
val aSource: Observable[String] = hotO.filter(x ⇒ x == "a")
val bSource: Observable[String] = hotO.filter(x ⇒ x == "b")
val cSource: Observable[String] = hotO.filter(x ⇒ x == "c")
aSource.subscribe(o ⇒ println("A: " + o), println, () ⇒ println("A Completed"))
bSource.subscribe(o ⇒ println("B: " + o), println, () ⇒ println("B Completed"))
cSource.subscribe(o ⇒ println("C: " + o), println, () ⇒ println("C Completed"))
You just need to make sure that the source observable is hot. The easiest way is to share it.

You don't have to collapse Observables from groupBy. You can instead subscribe to them.
Something like this:
String[] inputs= {"a", "b", "c", "a", "b", "b", "b", "a"};
Action1<String> a = s -> System.out.print("-a-");
Action1<String> b = s -> System.out.print("-b-");
Action1<String> c = s -> System.out.print("-c-");
Observable
.from(inputs)
.groupBy(s -> s)
.subscribe((g) -> {
if ("a".equals(g.getKey())) {
g.subscribe(a);
}
if ("b".equals(g.getKey())) {
g.subscribe(b);
}
if ("c".equals(g.getKey())) {
g.subscribe(c);
}
});
If statements look kinda ugly but at least you can handle each stream separately. Maybe there is a way of avoiding them.

I have been thinking about this and Tomas solution is OK, but the issue is that it converts the stream to a hot observable.
You can use share in combination with defer in order to get a cold observable with other streams.
For example (Java):
var originalObservable = ...; // some source
var coldObservable = Observable.defer(() -> {
var shared - originalObservable.share();
var aSource = shared.filter(x -> x.equals("a"));
var bSource = shared.filter(x -> x.equals("b"));
var cSource = shared.filter(x -> x.equals("c"));
// some logic for sources
return shared;
});

In RxJava there is a special version of publish operator that takes a function.
ObservableTransformer {
it.publish { shared ->
Observable.merge(
shared.ofType(x).compose(transformerherex),
shared.ofType(y).compose(transformerherey)
)
}
}
This splits the event stream by type. Then you can process them separately by composing with different transformers. All of them share single subscription.

Related

Is it faster to create a new Map or clear it and use again?

I need to use many Maps in my project so I wonder which way is more efficient:
val map = mutable.Map[Int, Int] = mutable.Map.empty
for (_ <- 0 until big_number)
{
// do something with map
map.clear()
}
or
for (_ <- 0 until big_number)
{
val map = mutable.Map[Int, Int] = mutable.Map.empty
// do something with map
}
to use in terms of time and memory?
Well, my formal answer would always be depends. As you need to benchmark your own scenario, and see what fits better for your scenario. I'll provide an example how you can try benchmarking your own code. Let's start with writing a measuring method:
def measure(name: String, f: () => Unit): Unit = {
val l = System.currentTimeMillis()
println(name + ": " + (System.currentTimeMillis() - l))
f()
println(name + ": " + (System.currentTimeMillis() - l))
}
Let's assume that in each iteration we need to insert into the map one key-value pair, and then to print it:
Await.result(Future.sequence(Seq(Future {
measure("inner", () => {
for (i <- 0 until 10) {
val map2 = mutable.Map.empty[Int, Int]
map2(i) = i
println(map2)
}
})
},
Future {
measure("outer", () => {
val map1 = mutable.Map.empty[Int, Int]
for (i <- 0 until 10) {
map1(i) = i
println(map1)
map1.clear()
}
})
})), 10.seconds)
The output in this case, is almost always equal between the inner and the outer. Please note that in this case I run the two options in parallel, as if I wouldn't the first one always takes significantly more time, no matter which one of then is first.
Therefore, we can conclude, that in this case they are almost the same.
But, if for example I add an immutable option:
Future {
measure("immutable", () => {
for (i <- 0 until 10) {
val map1 = Map[Int, Int](i -> i)
println(map1)
}
})
}
it always ends up first. This makes sense because immutable collections are much more performant than the mutables.
For better performance tests you probably need to use some third parties, such as scalameter, or others that exists.

How can I elegantly return a map while doing work

I'm new to Scala, coming over from Java, and I'm having trouble elegantly returning Map from this function. What's an elegant way to rewrite this function so it doesn't have this awful repetition?
val data = getData
if (someTest(data)) {
val D = doSomething(data)
val E = doWork(D)
if (someTest2(E)) {
val a = A()
val b = B()
Map(a -> b)
} else {
Map.empty
}
} else {
Map.empty
}
If you have a problem with connecting too many conditions with &&, you can put everything into the natural short-circuiting monad (namely Option), perform bunch of filter and map-steps on it, replace the result by Map(A() -> B()) if all the tests are successful, and then unwrap the Option with a getOrElse in the end:
Option(getData)
.filter(someTest)
.map(doSomething andThen doWork)
.filter(someTest2)
.map(_ => Map(A() -> B()))
.getOrElse(Map.empty)
In this way, you can organize your code "more vertically".
Andrey's answer is correct, but the logic can also be written using a for statement:
(for {
data <- Option(getData) if someTest(data)
d = doSomething(data)
e = doWork(d) if someTest2(e)
} yield {
Map(A() -> B())
}).getOrElse(Map.empty)
This retains a bit more of the original form of the code, but it is a matter of taste which version to use. You can also put the if on a separate line if that makes it clearer.
Note that I have retained the values of d and e on the assumption that they are actually meaningful in the real code. If not then there can be a single if expression that does all the tests, as noted in other answers:
(for {
data <- Option(getData)
if someTest(data) && someTest2(doWork(doSomething(data)))
} yield {
Map(A() -> B())
}).getOrElse(Map.empty)
You may rewrite to take advantage of short circuit, if you are mentioning to the else blocks with Map.empty as repetition.
val data = getData
if (someTest(data) && someTest2(doWork(doSomething(data)))) {
val a = A()
val b = B()
Map(a -> b)
} else {
Map.empty
}
Second solution using lazy evaluation:
val data = getData
lazy val D = doSomething(data)
lazy val E = doWork(D)
if (someTest(data) && someTest2(E)) {
val a = A()
val b = B()
Map(a -> b)
} else {
Map.empty
}
D, E and someTest2(E) won't get evaluated if someTest(data) is false.

Handle Akka stream's first element specially

Is there an idiomatic way of handling Akka stream's Source first element in a special way? What I have now is:
var firstHandled = false
source.map { elem =>
if(!firstHandled) {
//handle specially
firstHandled = true
} else {
//handle normally
}
}
Thanks
While I would generally go with Ramon's answer, you could also use prefixAndTail, with a prefix of 1, together with flatMapConcat to achieve something similar:
val src = Source(List(1, 2, 3, 4, 5))
val fst = Flow[Int].map(i => s"First: $i")
val rst = Flow[Int].map(i => s"Rest: $i")
val together = src.prefixAndTail(1).flatMapConcat { case (head, tail) =>
// `head` is a Seq of the prefix elements, which in our case is
// just the first one. We can convert it to a source of just
// the first element, processed via our fst flow, and then
// concatenate `tail`, which is the remainder...
Source(head).via(fst).concat(tail.via(rst))
}
Await.result(together.runForeach(println), 10.seconds)
// First: 1
// Rest: 2
// Rest: 3
// Rest: 4
// Rest: 5
This of course works not just for the first item, but for the first N items, with the proviso that those items will be taken up as a strict collection.
Using zipWith
You could zip the original Source with a Source of Booleans that only returns true the first time. This zipped Source can then be processed.
First we'll need a Source that emits the Booleans:
//true, false, false, false, ...
def firstTrueIterator() : Iterator[Boolean] =
(Iterator single true) ++ (Iterator continually false)
def firstTrueSource : Source[Boolean, _] =
Source fromIterator firstTrueIterator
We can then define a function that handles the two different cases:
type Data = ???
type OutputData = ???
def processData(data : Data, firstRun : Boolean) : OutputData =
if(firstRun) { ... }
else { ... }
This function can then be used in a zipWith of your original Source:
val originalSource : Source[Data,_] = ???
val contingentSource : Source[OutputData,_] =
originalSource.zipWith(firstTrueSource)(processData)
Using Stateful Flow
You could create a Flow that contains state similar to the example in the question but with a more functional approach:
def firstRunner(firstCall : (Data) => OutputData,
otherCalls : (Data) => OutputData) : (Data) => OutputData = {
var firstRun = true
(data : Data) => {
if(firstRun) {
firstRun = false
firstCall(data)
}
else
otherCalls(data)
}
}//end def firstRunner
def firstRunFlow(firstCall : (Data) => OutputData,
otherCalls : (Data) => OutputData) : Flow[Data, OutputData, _] =
Flow[Data] map firstRunner(firstCall, otherCalls)
This Flow can then be applied to your original Source:
def firstElementFunc(data : Data) : OutputData = ???
def remainingElsFunc(data : Data) : OutputData = ???
val firstSource : Source[OutputData, _] =
originalSource via firstRunFlow(firstElementFunc,remainingElseFunc)
"Idiomatic Way"
Answering your question directly requires dictating the "idiomatic way". I answer that part last because it is the least verifiable by the compiler and is therefore closer to opinion. I would never claim to be a valid classifier of idiomatic code.
My personal experience with akka-streams has been that it is best to switch my perspective to imagining an actual stream (I think of a train with boxcars) of Data elements. Do I need to break it up into multiple fixed size trains? Do only certain boxcars make it through? Can I attach another train side-by-side that contains Boolean cars which can signal the front? I would prefer the zipWith method due to my regard of streams (trains). My initial approach is always to use other stream parts connected together.
Also, I find it best to embed as little code in an akka Stream component as possible. firstTrueIterator and processData have no dependency on akka at all. Concurrently, the firstTrueSource and contingentSource definitions have virtually no logic. This allows you to test the logic independent of a clunky ActorSystem and the guts can be used in Futures, or Actors.
You can use prepend to prepend a source to flows. Just prepend single item source to the flow, after it is drained, rest of the original source will continue.
https://doc.akka.io/docs/akka/current/stream/operators/Source-or-Flow/prepend.html
Source(List(1, 2, 3))
.prepend(Source.single(0))
.runWith(Sink.foreach(println))
0
1
2
3
While I prefer the approach with zip, one can also use statefulMapConcat:
source
.statefulMapConcat { _ =>
var firstRun = true
elem => {
if (firstRun) {
//first
firstRun = false
} else {
//not first
}
}
}

Scala on Eclipse gives errors on Map operations

I am trying to write a word Count program using Maps in Scala. From various sources on the internet, I found that 'contains', adding elements to the Map using '+' and updating the existing values are valid. But Eclipse gives me errors when I try to use those operations in my code:
object wc {
def main(args:Array[String])={
val story = """ Once upon a time there was a poor lady with a son who was lazy
she was worried how she will grow up and
survive after she goes """
count(story.split("\n ,.".toCharArray()))
}
def count(s:Array[String])={
var count = scala.collection.mutable.Map
for(i <- 0 until s.size){
if(count.contains(s(i))) {
count(s(i)) = count(s(i))+1
}
else count = count + (s(i),1)
}
println(count)
}
}
these are the error messages I get in eclipse:
1.)
2.)
3.)
I tried these operations on REPL and they were working fine without any errors. Any help would be appreciated. Thank you!
You need to instantiate a typed mutable Map (otherwise you're looking for the contains attribute on Map.type; which isn't there):
def count(s: Array[String]) ={
var count = scala.collection.mutable.Map[String, Int]()
for(i <- 0 until s.size){
if (count.contains(s(i))) {
// count += s(i) -> (count(s(i)) + 1)
// can be rewritten as
count(s(i)) += 1
}
else count += s(i) -> 1
}
println(count)
}
Note: I also fixed up the lines updating count.
Perhaps this is better written as a groupBy:
a.groupBy({s: String => s}).mapValues(_.length)
val a = List("a", "a", "b", "c", "c", "c")
scala> a.groupBy({s: String => s}).mapValues(_.length)
Map("b" -> 1, "a" -> 2, "c" -> 3): Map[String, Int]

Is there an Rx operator for combining the latest from streams 1 and 2 only when stream 2 emits things?

Here's my attempt at drawing the marble diagram --
STREAM 1 = A----B----C---------D------>
(magical operator)
STREAM 2 = 1----------2-----3-----4--->
STREAM 3 = 1A---------2C----3C----4D-->
I am basically looking for something that generates stream 3 from streams 1 and 2. Basically, whenever something is emitted from stream 2, it combines it with the latest from stream 1. combineLatest is similar to what I want but I only want things emitted from stream 3 when something is emitted from stream 2, not stream 1. Does an operator like this exist?
There is an operator that does what you need: One overload of sample takes another observable instead of duration as a parameter. The documentation is here: https://github.com/ReactiveX/RxJava/wiki/Filtering-Observables#sample-or-throttlelast
The usage (I'll give examples in scala):
import rx.lang.scala.Observable
import scala.concurrent.duration
import duration._
def o = Observable.interval(100.milli)
def sampler = Observable.interval(180.milli)
// Often, you just need the sampled observable
o.sample(sampler).take(10).subscribe(x ⇒ println(x + ", "))
Thread.sleep(2000)
// or, as for your use case
o.combineLatest(sampler).sample(sampler).take(10).subscribe(x ⇒ println(x + ", "))
Thread.sleep(2000)
The output:
0,
2,
4,
6,
7,
9,
11,
13,
15,
16,
(2,0),
(4,1),
(6,2),
(7,3),
(9,4),
(11,5),
(13,6),
(15,7),
(16,8),
(18,9),
There is a slight catch in that duplicate entries from the sampled observable are swallowed (see discussion at https://github.com/ReactiveX/RxJava/issues/912). Other than that, I think it is exactly what you are looking for.
withLatestFrom seems to fit exactly what I was looking for - http://rxmarbles.com/#withLatestFrom
As far as I know there isn't a single existing operator that will do what you want. However you can compose one by using CombineLatest and DistinctUntilChanged as follows:
var joined = Observable.CombineLatest(sourceA, sourceB, (a,b) => new { A = a, B = b })
.DistinctUntilChanged(pair => pair.B);
EDIT:
The above will work as long as the values for STREAM 1 change each time. If they do not, then then use the following, which is less clear, but works in all situations (that I've tested anyway).
var joined = Observable.Join(
sourceB,
sourceA,
_ => Observable.Return(Unit.Default),
_ => sourceA,
(a, b) => new { A = a, B = b });
The Join operator is never intuitive to me, the best explanation I've found is here.
In answer to #Matthew's comment
var buttonClicks = Observable.FromEventPattern<MouseButtonEventArgs>(this,
"MouseLeftButtonDown")
.Select(_ => Unit.Default);
var sequence = Observable.Interval(TimeSpan.FromSeconds(1));
var joined = Observable.Join(
buttonClicks,
sequence,
_ => Observable.Return(Unit.Default),
_ => sequence,
(b, s) => s); // No info in button click here
Here is a fairly simple way to do it:
var query = stream2.Zip(
stream1.MostRecent(' '),
(s2,s1) => string.Format("{0}{1}", s2, s1));
MostRecent can be supplied a "zero" value which is used in the event stream1 has not emitted yet. This could be null for reference types, but I used a char for stream1 so supplied a space.
I think that the Switch operator is the key here.
Try this:
var query =
stream1
.Select(s1 => stream2.Select(s2 => new { s1, s2 }))
.Switch();
The following test code:
query
.Select(s => String.Format("{0}{1}", s.s2, s.s1))
.Subscribe(Console.WriteLine);
stream1.OnNext('A');
stream2.OnNext(1);
stream1.OnNext('B');
stream1.OnNext('C');
stream2.OnNext(2);
stream2.OnNext(3);
stream1.OnNext('D');
stream2.OnNext(4);
Gives these results:
1A
2C
3C
4D
Please let me know if this is correct.
A solution
public static IObservable<TR> Sample<TSource, TSampler, TR>
(this IObservable<TSource> source,
IObservable<TSampler> sampler,
Func<TSource, TSampler, TR> combiner)
{
return source.Publish
(rs => sampler
.Zip
( rs.MostRecent(default(TSource))
, (samplerElement, sourceElement)
=> combiner(sourceElement, samplerElement)
)
.SkipUntil(rs)
);
}
with a test case because this kind of thing is tricky to get right.
public class SampleSpec : ReactiveTest
{
TestScheduler _Scheduler = new TestScheduler();
[Fact]
public void ShouldWork()
{
var sampler = _Scheduler.CreateColdObservable
( OnNext(10, "A")
, OnNext(20, "B")
, OnNext(30, "C")
, OnNext(40, "D")
, OnNext(50, "E")
, OnNext(60, "F")
);
var source = _Scheduler.CreateColdObservable
( Enumerable
.Range(5,100)
.Where(i=>i%10!=0)
.Select(i=>OnNext(i,i)).ToArray());
var sampled = source.Sample
(sampler, Tuple.Create);
var actual = _Scheduler.Start
(() =>
sampled
, created: 0
, subscribed: 1
, disposed: 1000);
actual.Messages.Count()
.Should()
.Be(6);
var messages = actual.Messages.Take(6)
.Select(v => v.Value.Value)
.ToList();
messages[0].Should().Be(Tuple.Create(9,"A"));
messages[1].Should().Be(Tuple.Create(19,"B"));
messages[2].Should().Be(Tuple.Create(29, "C"));
messages[3].Should().Be(Tuple.Create(39, "D"));
messages[4].Should().Be(Tuple.Create(49, "E"));
messages[5].Should().Be(Tuple.Create(59, "F"));
}
}