RxJava: why same transformations are recomputed for each observables branch? - system.reactive

Introduction
Consider simple piece of java code. It defines two observables a and b in terms of c which itself is defined using d (a, b, c, d have type Observable<Integer>):
d = Observable.range(1, 10);
c = d.map(t -> t + 1);
a = c.map(t -> t + 2);
b = c.map(t -> t + 3);
This code can be visualised using diagram where each arrow (->) represents transformation (map method):
.--> a
d --> c --|
'--> b
If several chains of observables have own part then (in theory) new values of common part can be calculated only once. In example above: every new d value could be transformed into d --> c only once and used both for a and b.
Question
In practice I observe that transformation is calculated for each chain where this transformation is used (test). In other words example above should be correctly drawn like this:
d --> c --> a
d --> c --> b
In case of resource-consuming transformations new subscription at end of chain will cause computation of whole chain (and performance penalty).
Is where proper way to force transformation result to be cached and computed only once?
My research
I found two solutions for this problem:
Pass unique identificators together with values and store transformation results in some external storage (external to rx library).
Use subject to implement map-like function which hides start of observables chain. MapOnce code; test.
Both works. Second is simple but smells like a hack.

You've identified hot and cold observables.
Observable.range returns a cold observable, though you're describing the resulting queries in a hierarchy as if they're hot; i.e., as if they'd share subscription side effects. They do not. Each time that you subscribe to a cold observable it may cause side effects. In your case, each time that you subscribe to range (or to queries established on range) it generates a range of values.
In the second point of your research, you've identified how to convert a cold observable into a hot observable; namely, using Subjects. (Though in .NET you don't use a Subject<T> directly; instead, you'd use an operator like Publish. I suspect RxJava has a similar operator and I'd recommend using it.)
Additional Details
The definition of hot by my interpretation, as described in detail in my blog post linked above, is when an observable doesn't cause any subscription side effects. (Note that a hot observable may multicast connection side effects when converting from cold to hot, but temperature only refers to the propensity of an observable to cause subscription side effects because that's all we really care about when talking about an observable's temperature in practice.)
The map operator (Select in .NET, mentioned in the conclusion of my blog post) returns an observable that inherits the temperature of its source, so in your bottom diagram c, a and b are cold because d is cold. If, hypothetically, you were to apply publish to d, then c, a and b would inherit the hot temperature from the published observable, meaning that subscribing to them wouldn't cause any subscription side effects. Thus publishing d converts a cold observable, namely range, into a hot observable.
.--> c --> a
d --|
.--> c --> b
However, your question was about how to share the computation of c as well as d. Even if you were to publish d, c would still be recomputed for both a and b for each notification from d. Instead, you want to share the results of c among a and b. I call an observable in which you want to share its computation side effects, "active". (I borrowed the term from the passive & active terminology used in neuroscience to describe electrochemical currents in neurons.)
In your top diagram, you're considering c to be active because it causes significant computation side effects, by your own interpretation. Note that c is active regardless of the temperature of d. To share the computation side effects of an active observable, perhaps surprisingly, you must use publish just like for a cold observable. This is because technically active computations are side effects in the same sense as cold observables, while passive computations have no side effects, just like hot observables. I've restricted the terms hot and cold to only refer to the initial computation side effects, which I call subscription side effects, because that's how people generally use them. I've introduced new terms, active and passive, to refer to computation side effects separately from subscription side effects.
The result is that these terms in practice just blend together intuitively. If you want to share the computation side effects of c, then simply publish it instead of d. By doing so, a and b implicitly become hot because map inherits subscription side effects, as stated previously. Therefore, you're effectively making the right side of the observable hot by publishing either d or c, but publishing c also shares its computation side effects.
If you publish c instead of d, then d remains cold, but it doesn't matter since c hides d from a and b. So by publishing c you're effectively publishing d as well. Therefore, applying publish anywhere within your observable makes the right side of the observable effectively hot. It doesn't matter where you introduce publish or how many observers or pipelines you're creating on the right side of the observable. However, choosing to publish c instead of d also shares the computation side effects of c, which technically completes the answer to your question. Q.E.D.

An Observable is lazily executed each time it is subscribed to (either explicitly or implicitly via composition).
This code shows how the source emits for a, b, and c:
Observable<Integer> d = Observable.range(1, 10)
.doOnNext(i -> System.out.println("Emitted from source: " + i));
Observable<Integer> c = d.map(t -> t + 1);
Observable<Integer> a = c.map(t -> t + 2);
Observable<Integer> b = c.map(t -> t + 3);
a.forEach(i -> System.out.println("a: " + i));
b.forEach(i -> System.out.println("b: " + i));
c.forEach(i -> System.out.println("c: " + i));
If you are okay buffering (caching) the result then it is as simple as using the .cache() operator to achieve this.
Observable<Integer> d = Observable.range(1, 10)
.doOnNext(i -> System.out.println("Emitted from source: " + i))
.cache();
Observable<Integer> c = d.map(t -> t + 1);
Observable<Integer> a = c.map(t -> t + 2);
Observable<Integer> b = c.map(t -> t + 3);
a.forEach(i -> System.out.println("a: " + i));
b.forEach(i -> System.out.println("b: " + i));
c.forEach(i -> System.out.println("c: " + i));
Adding the .cache() to the source makes it so it only emits once and can be subscribed to many times.
For large or infinite data sources caching is not an option so multicasting is the solution to ensure the source only emits once.
The publish() and share() operators are a good place to start, but for simplicity, and since this is a synchronous example, I'll show with the publish(function) overload which is often the easiest to use.
Observable<Integer> d = Observable.range(1, 10)
.doOnNext(i -> System.out.println("Emitted from source: " + i))
.publish(oi -> {
Observable<Integer> c = oi.map(t -> t + 1);
Observable<Integer> a = c.map(t -> t + 2);
Observable<Integer> b = c.map(t -> t + 3);
return Observable.merge(a, b, c);
});
d.forEach(System.out::println);
If a, b, c are wanted individually then we can wire everything up and "connect" the source when ready:
private static void publishWithConnect() {
ConnectableObservable<Integer> d = Observable.range(1, 10)
.doOnNext(i -> System.out.println("Emitted from source: " + i))
.publish();
Observable<Integer> c = d.map(t -> t + 1);
Observable<Integer> a = c.map(t -> t + 2);
Observable<Integer> b = c.map(t -> t + 3);
a.forEach(i -> System.out.println("a: " + i));
b.forEach(i -> System.out.println("b: " + i));
c.forEach(i -> System.out.println("c: " + i));
// now that we've wired up everything we can connect the source
d.connect();
}
Or if the source is async we can use refCounting:
Observable<Integer> d = Observable.range(1, 10)
.doOnNext(i -> System.out.println("Emitted from source: " + i))
.subscribeOn(Schedulers.computation())
.share();
However, refCount (share is an overload to provide it) allows race conditions so won't guarantee all subscribers get the first values. It is usually only wanted for "hot" streams where subscribers are coming and going. For a "cold" source that we want to ensure everyone gets, the previous solutions with cache() or publish()/publish(function) are the preferred approach.
You can learn more here: https://github.com/ReactiveX/RxJava/wiki/Connectable-Observable-Operators

Related

How can a+b be NOT equal to b+a?

Our professor said that in computer logic it's important when you add a number to another so a+b and b+a are not always equal.
Though,I couldn't find an example of when they would be different and why they won't be equal.
I think it would have to do something with bits but then again ,I'm not sure.
Although you don't share a lot of context it sounds as if your professor did not elaborate on that or you missed something.
In the case that he was talking about logic in general, he could have meant that the behavior of the + operator depends on how you define it.
Example: The definition (+) a b := if (a==0) then 5 else 0 results in a + operator which is not associative, e.g. 1 + 0 would be 0 but 0 + 1 would be 5. There are many programming languages that allow this redefinition (overwriting) of standard operators.
But with the context you share, this is all speculative.
One obscure possibility is if one or other of a or b is a high-definition timer value - ticks since program start.
Due to the cpu cycle(s) consumed to pop one of the values before addition, it's possible the sum could be different dependant on the order.
One more possibility is if a and b are expressions with side effects. E.g.
int x = 0;
int a() {
x += 1;
return x;
}
int b() {
return x;
}
a() + b() will return 2 and b() + a() will return 1 (both from initial state).
Or it could be that a or b are NaN, in which case even a == a is false. Though this one isn't connected with "when you add a number to another".

problems generating interval information

Given a a binary function over time, I try to extract the information about the intervals occuring in this function.
E.g. I have the states a and b, and the following function:
a, a, b, b, b, a, b, b, a, a
Then i would want a fact interval(Start, Length, Value) like this:
interval(0, 2, a)
interval(2, 3, b)
interval(5, 1, a)
interval(6, 2, b)
interval(8, 2, a)
Here is what I got so far:
time(0..9).
duration(1..10).
value(a;b).
1{ function(T, V): value(V) }1 :- time(T).
interval1(T, Length, Value) :-
time(T), duration(Length), value(Value),
function(Ti, Value): Ti >= T, Ti < T + Length, time(Ti).
:- interval1(T, L, V), function(T + L, V).
#show function/2.
#show interval1/3.
This actually works kinda well, but still not correctly, this is my output, when I run it with clingo 4.5.4:
function(0,b)
function(1,a)
function(2,b)
function(3,a)
function(4,b)
function(5,a)
function(6,b)
function(7,a)
function(8,b)
function(9,a)
interval1(0,1,b)
interval1(1,1,a)
interval1(2,1,b)
interval1(3,1,a)
interval1(4,1,b)
interval1(5,1,a)
interval1(6,1,b)
interval1(7,1,a)
interval1(8,1,b)
interval1(9,1,a)
interval1(9,10,a)
interval1(9,2,a)
interval1(9,3,a)
interval1(9,4,a)
interval1(9,5,a)
interval1(9,6,a)
interval1(9,7,a)
interval1(9,8,a)
interval1(9,9,a)
which has only one bug: all the intervals at T == 9 (except for the one where L == 1)
So I tried to add the following constraint, to get rid of those:
:- interval1(T, L, V), not time(T + L - 1).
which in my mind translates to "it is prohibited, to have an interval, such that T + L is not a time"
But now clingo said the problem would be unsatisfiable.
So I tried another solution, which should do the same, but in a little less general way:
:- interval1(T, L, V), T + L > 10.
Which also made the whole thing unsolvable.
Which I really don't understand, I'd just expect both of those rules to just get rid of the intervals, that run out of the function.
So why do they completely kill all elements of the model?
Also, during my experiments, I replaced the function rule with:
function(
0, a;
1, a;
2, b;
3, b;
4, b;
5, b;
6, a;
7, b;
8, a;
9, a
).
Which would make the whole thing unsatisfiable even without the problematic constraints, why is that?
So yeah ... I guess, I fundamentally missunderstood something, and I would be really greatfull if someone would tell me what exactly that is.
Best Regards
Uzaku
The programs with constraints are inconsistent because in ASP any program which contains both the fact a. and the constraint :-a. is inconsistent. You are basically saying that a is true, and, at the same time, a cannot be true.
In your case, for example, you have a rule which tells that interval1(9,10,a) is true for some function, and, on the other hand, you have a constraint which says that interval(9,10,a) cannot be true, so you get inconsistency.
A way to get rid of the undesired intervals would be, for example, to add the extra atom in the definition of an interval, e.g:
interval1(T, Length, Value) :-
time(T), duration(Length), value(Value),
time(T+Length-1), % I added this
function(Ti, Value): Ti >= T, Ti < T + Length, time(Ti).
Now the program is consistent.
I couldn't reproduce the inconsistency for the specific function you have provided. For me, the following is consistent:
time(0..9).
duration(1..10).
value(a;b).
%1{ function(T, V): value(V) }1 :- time(T).
function(0,a).
function(1,a).
function(2,b).
function(3,b).
function(4,b).
function(5,b).
function(6,a).
function(7,b).
function(8,a).
function(9,a).
interval1(T, Length, Value) :-
time(T), duration(Length), value(Value),
time(T+Length-1),
function(Ti, Value): Ti >= T, Ti < T + Length, time(Ti).
#show function/2.
#show interval1/3.
This is what I get in the output:
$ clingo test 0
clingo version 4.5.4
Reading from test
Solving...
Answer: 1
function(0,a) function(1,a) function(2,b) function(3,b) function(4,b) function(5,b) function(6,a) function(7,b) function(8,a) function(9,a) interval1(0,1,a) interval1(1,1,a) interval1(0,2,a) interval1(6,1,a) interval1(8,1,a) interval1(9,1,a) interval1(8,2,a) interval1(2,1,b) interval1(3,1,b) interval1(2,2,b) interval1(4,1,b) interval1(3,2,b) interval1(2,3,b) interval1(5,1,b) interval1(4,2,b) interval1(3,3,b) interval1(2,4,b) interval1(7,1,b)
SATISFIABLE
Models : 1
Calls : 1
Time : 0.002s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time : 0.000s
We are getting more intervals than needed, since some of them are not maximal, but I am leaving this for you to think about :)
Hope this helps.

Rx how to merge observables and only fire if all provided an onnext in a specified timeframe

I have a number of observable event streams that are all providing events with timestamps. I don't care about the events individually, I need to know when they all fired within a specified timeframe.
For example:
Button one was clicked (don't care)
Button two was clicked (don't care)
Button one was clicked and within 5 seconds button two was clicked (I need this)
I tried "and then when" but I get old events and can't figure out how to filter them out if it is not within the time window.
Thanks!
Edit:
I attempted to create a marble diagram to clarify what I am trying to achieve...
I have a bunch of random event streams represented in the top portion. Some events fire more often then others. I only want to capture the group of events that fired within a specified time window. In this example I used windows of 3 seconds. The events I want are highlighted in dark black all other events should be ignored. I hope this helps better explain the problem.
Is sequence important? If so, the usual way of dealing with "this and then that" is using flatMapLatest. You can achieve your other constraints by applying them to stream passed to flatMapLatest. Consider the following example:
const fromClick = x => Rx.Observable.fromEvent(document.getElementById(x), 'click');
const a$ = fromClick('button1');
const b$ = fromClick('button2');
const sub = a$.flatMapLatest(() => b$.first().timeout(5000, Rx.Observable.never()))
.subscribe(() => console.log('condition met'));
Here we're saying "for each click on button 1, start listening to button 2, and return either the first click or nothing (if we hit the timeout). Here's a working example: https://jsbin.com/zosipociti/edit?js,console,output
You want to use the Sample operator. I'm not sure if you want .NET or JS sample code, since you tagged both.
Edit:
Here's a .NET sample. metaSample is an observable of 10 child observables. Each of the child observables has numbers going from 1 to 99 with random time-gaps between each number. The time gaps are anywhere between 0 to 200 milliseconds.
var random = new Random();
IObservable<IObservable<int>> metaSample = Observable.Generate(1, i => i < 10, i => i + 1, i =>
Observable.Generate(1, j => j < 100, j => j + 1, j => j, j => TimeSpan.FromMilliseconds(random.Next(200))));
We then Sample each of the child operators every one second. This gives us the latest value that occurred in that one second window. We then merge those sampled streams together:
IObservable<int> joined = metaSample
.Select(o => o.Sample(TimeSpan.FromSeconds(1)))
.Merge();
A marble diagram for 5 of them could look like this:
child1: --1----2--3-4---5----6
child2: -1-23---4--5----6-7--8
child3: --1----2----3-4-5--6--
child4: ----1-2--34---567--8-9
child5: 1----2--3-4-5--------6-
t : ------1------2------3-
------------------------------
result: ------13122--45345--5768--
So the after 1 second, it grabs the latest from each child and dumps it, after 2 seconds the same. After 3 seconds, notice that child5 hasn't emitted anything, so there's only 4 numbers emitted. Obviously with our parameters that's impossible, but that's demonstrated as how Sample would work with no events.
This is the closest I have come to accomplishing this task... There has to be a cleaner way with groupjoin but I can't figure it out!
static void Main(string[] args)
{
var random = new Random();
var o1 = Observable.Interval(TimeSpan.FromSeconds(2)).Select(t => "A " + DateTime.Now.ToString("HH:mm:ss"));
o1.Subscribe(Console.WriteLine);
var o2 = Observable.Interval(TimeSpan.FromSeconds(3)).Select(t => "B " + DateTime.Now.ToString("HH:mm:ss"));
o2.Subscribe(Console.WriteLine);
var o3 = Observable.Interval(TimeSpan.FromSeconds(random.Next(3, 7))).Select(t => "C " + DateTime.Now.ToString("HH:mm:ss"));
o3.Subscribe(Console.WriteLine);
var o4 = Observable.Interval(TimeSpan.FromSeconds(random.Next(5, 10))).Select(t => "D " + DateTime.Now.ToString("HH:mm:ss"));
o4.Subscribe(Console.WriteLine);
var joined = o1
.CombineLatest(o2, (s1, s2)=> new { e1 = s1, e2 = s2})
.CombineLatest(o3, (s1, s2) => new { e1 = s1.e1, e2 = s1.e2, e3 = s2 })
.CombineLatest(o4, (s1, s2) => new { e1 = s1.e1, e2 = s1.e2, e3 = s1.e3, e4 = s2 })
.Sample(TimeSpan.FromSeconds(3));
joined.Subscribe(e => Console.WriteLine($"{DateTime.Now}: {e.e1} - {e.e2} - {e.e3} - {e.e4}"));
Console.ReadLine();
}

RX and buffering

I'm trying to obtain the following observable (with a buffer capacity of 10 ticks):
Time 0 5 10 15 20 25 30 35 40
|----|----|----|----|----|----|----|----|
Source A B C D E F G H
Result A E H
B F
C G
D
Phase |<------->|-------|<------->|<------->|
B I B B
That is, the behavior is very similar to the Buffer observable with the difference that the buffering phase is not in precise time slot, but starts at the first symbol pushed in the idle phase. I mean, in the example above the buffering phases start with the 'A', 'E', and 'H' symbols.
Is there a way to compose the observable or do I have to implement it from scratch?
Any help will be appreciated.
Try this:
IObservable<T> source = ...;
IScheduler scheduler = ...;
IObservable<IList<T>> query = source
.Publish(obs => obs
.Buffer(() => obs.Take(1).IgnoreElements()
.Concat(Observable.Return(default(T)).Delay(duration, scheduler))
.Amb(obs.IgnoreElements())));
The buffer closing selector is called once at the start and then once whenever a buffer closes. The selector says "The buffer being started now should be closed duration after the first element of this buffer, or when the source completes, whichever occurs first."
Edit: Based on your comments, if you want to make multiple subscriptions to query share a single subscription to source, you can do that by appending .Publish().RefCount() to the query.
IObservable<IList<T>> query = source
.Publish(obs => obs
.Buffer(() => obs.Take(1).IgnoreElements()
.Concat(Observable.Return(default(T)).Delay(duration, scheduler))
.Amb(obs.IgnoreElements())));
.Publish()
.RefCount();

What is the difference between Reactive programming and plain old closures?

Example from scala.rx:
import rx._
val a = Var(1); val b = Var(2)
val c = Rx{ a() + b() }
println(c()) // 3
a() = 4
println(c()) // 6
How is the above version better than:
var a = 1; var b = 2
def c = a + b
println(c) // 3
a = 4
println(c) // 6
The only thing I can think of is that the first example is efficient in the sense that unless a or b changes, c is not recalculated but in my version, c is recomputed every time I invoke c() but that is just a special case of memoization with size=1 e.g. I can do this to prevent re-computations using a memoization macro:
var a = 1; var b = 2
#memoize(maxSize = 1) def c(x: Int = a, y: Int = z) = x + y
Is there anything that I am missing to grok about reactive programming that provides insight into why it might be a better paradigm (than memoized closures) in certain cases?
Problem: It's a bad example
The example on the web page doesn't illustrate the purpose of Scala.RX very well. In that sense it is a quite bad example.
What is Scala.RX for?
It's about notifications
The idea of Scala.Rs is that a piece of code can get notifications, when data changes. Usually the this notification is used to (re-)calculate a result that depends on the changed data.
Scala.RX automates the wiring
When the calculation goes over multiple stages, it becomes quite hard to track which intermediate result depends on which data and on which other intermediate results. Additionally on must recalculate the intermediate results in the correct order.
You can think of this just like a big excel sheet which must of formulas that depend of each other. When you change one of the input values, Excel has to figure out, which parts of the sheet must be recalculated in which order. When Excel has re-calculated all the changed cells, it can update the display.
Scala.RX can do a similar thing than Excel: It tracks how the formulas depend on each other on notifies the ones that need to update in the correct order.
Purpose: MVC
Scala.RX is a nice tool to implement the MVC-pattern, especially when you have business applications that you could also bring to excel.
There is also a variant that works with Scala.js, i.e. that runs in the browser as part of a HTML site. This can be quite useful if you want to dynamically update parts of a HTML page according to changes on the server or edits of the user.
Limitations
Scala.RX doe not scale when you have a huge amounts of input data, e.g. operations on huge matrices.
A better example
import rx._
import rx.ops._
val a = Var(1); val b = Var(2)
val c: Rx[Int] = Rx{ a() + b() }
val o = c.foreach{value =>
println(s"c has a new value: ${value}")
}
a()=4
b()=12
a()=35
Gives you the following output:
c has a new value: 3
c has a new value: 6
c has a new value: 16
c has a new value: 47
Now imagine instead of printing the value, you will refresh controls in a UI or parts of a HTML page.