Reverse TakeUntil logic using RX.NET - system.reactive

First off, for full disclosure I'm a n00b in RX, but I'm learning daily now...
I need to build an Observable that's going to enable a button (or automatically start an action) as long as a stream of another incoming observable averaged signals is coming in within a certain range. As far as I've learned so far, I could do that by adding a .Where to an averaged Observable in which I can then check that my observed average values created from an event handler are in fact within a given range...
What I need also however is to have these observable values influence the state of my action/button (allow it to be executed) just until its underlying/inner signals overstep the given range. Is such a thing in RX possible as a reversed .TakeUntil(inverse where clause) which I now think could maybe solve my problem, or should I just reuse the original observable and copy it with a negated .Where clause and then use that as another independent observable...if latter, is there some performance loss by reusing almost identical observables multiple times, just changing few their linq queries...how many observables is too much please?

It seems to me that something like this is sufficient.
var source = Observable.Range(0, 10);
var query =
source
.Buffer(4, 1)
.Select(xs => xs.Average())
.Select(x => x > 5.0 && x <= 7.0);
query.ObservableOn(button).Subscribe(enabled => button.Enabled = enabled);
(I've assumed Windows forms.)
This gives me:
False
False
False
False
True
True
False
False
False
False
I can improve it slightly like this:
var query =
source
.Buffer(4, 1)
.Select(xs => xs.Average())
.Select(x => x > 5.0 && x <= 7.0)
.StartWith(true)
.DistinctUntilChanged();
That then gives me:
True
False
True
False
Please let me know if I've missed anything.

Related

Does Scala intelligently terminate calculating OR expressions for fold operations?

Suppose you have a value val list: List[Date]. You would like to know if any one of the dates in this list occur after some startDate. You could do this
list.fold(false)((a, b) => startDate.compareTo(b) < 0 || a)
which would return true if any date occurred on or after startDate thus achieving our objective.
However, since this is an OR statement being used, if even only one date satisfies the condition startDate.compareTo(b) < 0, then the whole fold operation will return true. Does Scala have a way of terminating execution of the fold and just returning the true when it hits it?
This sounds like a usecase for exists.
list.exists(a => startDate.compareTo(a) < 0)
https://www.scala-lang.org/api/current/scala/collection/immutable/List.html#exists(p:A=%3EBoolean):Boolean
However, since this is an OR statement being used, if even only one date satisfies the condition startDate.compareTo(b) < 0, then the whole fold operation will return true.
Actually, not necessarily; startDate.compareTo(b) < 0 could throw an exception. You'd need to change the order of operands to (a, b) => a || startDate.compareTo(b) < 0; even then it would be correct for a List, but not e.g. a Stream.
At any rate, as far as I know the answer is no even for the cases where it's correct. fold can't see inside the function it receives, only call it, so it would require specific support for this case in the compiler.
See also: Abort early in a fold and https://users.scala-lang.org/t/breaking-out-of-a-map-or-other-iteration/1091.

Understanding spark process behaviour

I would like to understand a process behavior. Basically this spark process must be create at most five files, one for each territory and save them into HDFS.
Territories are provided by an array of five strings. But when I'm looking at spark UI, I see many times the same action being executed.
These are my questions:
Why isEmpty action has been executed 4 times for each territory instead of one? I expect just one action for territory.
How are decided the tasks number when isEmpty is calculated? First time there is just one task, the second time tasks are 4, third are 20 and fourth are 35. Which the logic behind that sizing? Can I control that number in some way?
NOTE: is someone has a more say big data solution for to accomplish the same process goal, please suggest me.
This is the code excerpt for the Spark process:
class IntegrationStatusD1RequestProcess {
logger.info(s"Retrieving all measurement point from DB")
val allMPoints = registryData.createIncrementalRegistryByMPointID()
.setName("allMPoints")
.persist(StorageLevel.MEMORY_AND_DISK)
logger.info("getTerritories return always an array of five String")
intStatusHelper.getTerritories.foreach { territory =>
logger.info(s"Retrieving measurement point for territory $territory")
val intStatusesChanged = allMPoints
.filter { m => m.getmPoint.substring(0, 3) == territory }
.setName(s"intStatusesChanged_${territory}")
.persist(StorageLevel.MEMORY_AND_DISK)
intStatusesChanged.isEmpty match {
case true => logger.info(s"No changes detected for territory")
case false =>
//create file and save it into hdfs
}
}
}
This is the image showing all the spark jobs:
The following first two images showing isEmpty stages:
isEmpty is inefficient if you expect it to be true!
Here's the RDD code for isEmpty:
def isEmpty(): Boolean = withScope {
partitions.length == 0 || take(1).length == 0
}
It calls take. This is an efficient implementation if you think the RDD isn't empty, but is a horrible implementation if you think that it is.
The implementation of take follows this recursive step, starting at parts = 1:
Collect the first parts partitions.
Check if this result contain >= n items.
If yes, take the first n
If no, repeat step 1 with parts = parts * 4.
This implementation strategy lets the execution short-circuit if the RDD has more elements than you want to take, which is usually true. But if your RDD has fewer elements than you want to take, you end up computing the partition #1 log4(nPartitions) + 1 times, partitions #2-4 log4(nPartitions) times, partitions #5-16 log4(nPartitions) - 1 times, and so on.
A better implementation for this use case
This implementation only computes each partition once by sacrificing short-circuit capability:
def fasterIsEmpty(rdd: RDD[_]): Boolean = {
rdd.mapPartitions(it => Iterator(it.isEmpty))
.fold(true)(_ && _)
}

Remember suchThat clauses when shrinking

If I have a custom generator then the shrinker will remember my suchThat clause and not shrink with invalid values:
val myGen = Gen.identifier.suchThat { _.length > 3 }
// all shrinks have > 3 characters
property("failing case") = forAll (myGen) { (a: String) =>
println(s"Gen suchThat Value: $a")
a == "Impossible"
}
If I do something further to the generated value (ie map it) then the shrinker "forgets" my suchThat clause:
// the shrinker will shrink all the way down to ""
property("failing case") = forAll (myGen.map{_ + "bbb"}) { (a: String) =>
println(s"Gen with map Value: $a")
a == "Impossible"
}
Is it possible to have suchThat values propagate through generators. In my real project I am doing more than a simple map but that seems to be the simplest example of the limitation I am hitting.
I'm fairly certain the answer is no (at least at this point in time).
This is quite annoying although perhaps not as trivial as it seems. The generator result does attempt to keep track of the sieve although it gets lost in map and flatMap. Apart from applying the sieve to the result of the shrink there isn't any other connection back to the generator. Even if there were all the intermediate results would need to be retained and applied to each sieve at the correct points. That then raises the question of: What exactly is being shrunk? The generated result or the original generator(s)?
The only solution that I have found so far is to either:
Disable shrinking, or
Implement a custom Shrink, or
Add a whenever clause that rechecks the generated value.
This can be quite challenging, especially when composing multiple generators.

In Rx (or RxJava/RxScala), how to make an auto-resetting stateful latch map/filter for measuring in-stream elapsed time to touch a barrier?

Apologies if the question is poorly phrased, I'll do my best.
If I have a sequence of values with times as an Observable[(U,T)] where U is a value and T is a time-like type (or anything difference-able I suppose), how could I write an operator which is an auto-reset one-touch barrier, which is silent when abs(u_n - u_reset) < barrier, but spits out t_n - t_reset if the barrier is touched, at which point it also resets u_reset = u_n.
That is to say, the first value this operator receives becomes the baseline, and it emits nothing. Henceforth it monitors the values of the stream, and as soon as one of them is beyond the baseline value (above or below), it emits the elapsed time (measured by the timestamps of the events), and resets the baseline. These times then will be processed to form a high-frequency estimate of the volatility.
For reference, I am trying to write a volatility estimator outlined in http://www.amazon.com/Volatility-Trading-CD-ROM-Wiley/dp/0470181990 , where rather than measuring the standard deviation (deviations at regular homogeneous times), you repeatedly measure the time taken to breach a barrier for some fixed barrier amount.
Specifically, could this be written using existing operators? I'm a bit stuck on how the state would be reset, though maybe I need to make two nested operators, one which is one-shot and another which keeps creating that one-shot... I know it could be done by writing one by hand, but then I need to write my own publisher etc etc.
Thanks!
I don't fully understand the algorithm and your variables in the example, but you can use flatMap with some heap-state and return empty() or just() as needed:
int[] var1 = { 0 };
source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
If you need a per-sequence state because of multiple consumers, you can defer the whole thing:
Observable.defer(() -> {
int[] var1 = { 0 };
return source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
}).subscribe(...);

How to switch from one sequence of events to another?

I would like that first event to arrive will cause some work. Later I would like to throttle down work a little bit. Until now I came with the following code:
var events = Observable.FromEventPattern<...>(...);
var throttled = events.Throttle(TimeSpan.FromSeconds(1));
events.Take(1).Subscribe((x) =>
{
DoWork(x);
throttled.Subscribe((y) => DoWork(y);
});
Is there a more elegant way of expressing it?
Apparently it's quite simple:
var events = Observable.FromEventPattern<...>(...);
var throttled = events.Throttle(TimeSpan.FromSeconds(1));
events.Take(1).Concat(throttled).Subscribe((x) => DoWork(x));
Concat will wait for the first sequence to finish and than move subscription to the second.
Another common way is to use select-many. This allows you to pass data from the 1st sequence to the 2nd sequence, it also allows the two sequences to be of different types:
In query comprehension syntax;
var q = from x in xs
from y in ys
select new {x, y};
-or- you can use the standard linq operators as extension methods (losing access to the x value however)
xs.SelectMany(x=>ys)
.Select(y=>y)