windowCount dropping values - reactive-programming

I'm trying to group my observable values into groups using windowCount, and for each value of each group send request.Then, concatenate those groups so that next group's requests would not start before current group's request are not completed.The problem is some values get skipped.Here's my code.(I'm not making actual ajax calls here, but Observable.timer should work for an example).
Observable.interval(300)
.take(12)
.windowCount(3)
.concatMap(obs => {
return obs.mergeMap(
v => Observable.timer(Math.random() * 1500).mapTo(v)
);
})
.do(v => console.log(v))
.finally(() => console.log('fin'))
.subscribe();
I tried to replace windowCount by creating the groups manually. And it works perfectly. No values are skipped.
Observable.interval(900)
.take(4)
.map(i => Observable.interval(300).take(3).map(j => j + i * 3))
.concatMap(obs => {
return obs.mergeMap(
v => Observable.timer(Math.random() * 1500).mapTo(v)
);
})
.do(v => console.log(v))
.finally(() => console.log('fin'))
.subscribe();
I was under impression that windowCount should group the emitted values the same way.But, apparently it does something else.
I would be really thankful for any explanation of its behavior. Thanks!

The missing values are a result of using a hot observable (Observable.interval(300)) that continues to output values that you are not storing for use.
Following is a slightly simplified version of your code that also logs the times that numbers are emitted. I replaced Math.random() with 1 so that the output is deterministic. I have also loaded the code in jsbin for you to try out:
https://jsbin.com/burocu/edit?js,console
Observable.interval(300)
.do(x => console.log(x + ") hot observable at: " + (x * 300 + 300)))
.take(12)
.windowCount(3)
.do(observe3 => {observe3.toArray()
.subscribe(x => console.log(x + " do window count at: " + (x[2] * 300 + 300)));})
.concatMap(obs => {
return obs.mergeMap(
v => Observable.timer(1 * 1500).mapTo(v)
)
.do(v => console.log(v + " merge map at: " + (v * 300 + 300 + 1500)));
})
.finally(() => console.log('fin windowCount'))
.subscribe();
It results in the output below. Notice that the hot observables march on while the other operators are still being processed.
This is what is giving you the impression that values are being dropped. You can see that windowCount(3) is doing what you thought but not when you thought.
"0) hot observable at: 300"
"1) hot observable at: 600"
"2) hot observable at: 900"
"0,1,2 do window count at: 900"
"3) hot observable at: 1200"
"4) hot observable at: 1500"
"5) hot observable at: 1800"
"3,4,5 do window count at: 1800"
"0 merge map at: 1800"
"6) hot observable at: 2100"
"1 merge map at: 2100"
"7) hot observable at: 2400"
"2 merge map at: 2400"
"8) hot observable at: 2700"
"6,7,8 do window count at: 2700"
"9) hot observable at: 3000"
"10) hot observable at: 3300"
"11) hot observable at: 3600"
"9,10,11 do window count at: 3600"
" do window count at: NaN"
"8 merge map at: 4200"
"fin windowCount"
Edit: further explanation...
After windowCount(3) there is a call to concatMap. concatMap is a combination of map and concatAll.
concatAll:
Joins every Observable emitted by the source (a higher-order
Observable), in a serial fashion. It subscribes to each inner
Observable only after the previous inner Observable has completed (emphasis added), and
merges all of their values into the returned observable.
So, looking at the output above we see that the first windowCount(3) values [0,1,2] are emitted between 1800 and 2400.
Notice that the second windowCount(3) values [3,4,5] are emitted at 1800. concatAll is not ready to subscribe when [3,4,5] is emitted because the previous inner Observable has not completed yet. So these values are effectively dropped.
Next, notice that the previous inner Observable [0,1,2] completes at 2400. concatAll subscribes at 2400.
The next value to appear is the value 8 at 2700 (300ms after the subscription started at 2400). The value 8 is then output by mergeMap at 4200 because of the interval delay of 300 from the subscription start point of 2400 and then a timer delay of 1500 (i.e 2400 + 300 + 1500 = 4200).
After this point the sequence is completed so no further values are emitted.
Please add a comment if more clarification is needed.

Related

Infinite loop stops

I have googled this a lot and I can only find answers that relate to conditions within the loop being met. I want this loop to run infinitely (Hence while 1==1) and I'm testing it at the moment by just leaving it running in Thonny. It runs for variable lengths of time and then just stops. It doesn't exit the program or stop running, it just behaves as if it's waiting for something but there's nothing that I can see that it's waiting for. The shell doesn't report any errors or report that it has stopped running, it simply stops printing the string in the fourth line print statement.
I am very new to python and Linux and I have no idea how to debug this problem or where to look for the stopping point. Even running it in debug mode doesn't render any helpful information. Has anyone got any suggestions please?
The only other thing that I have tried outside of what I have said is I have tried running it on a fresh install of Raspberry Pi OS on three different Raspberry Pi 4 Model B computers. It behaves exactly the same on all of them.
while 1==1:
time.sleep(1)
cnt = 1
print('One = One loop ' + str(datetime.today()) + ' CNT: ' + str(cnt))
while Decimal(target_temperature()) - Decimal(0.3) >= Decimal(actual_temperature()) and switch_state() == 'currently not running':
print('Before heating loop ' + str(datetime.today()))
try:
if cnt == 1:
if Decimal(target_temperature()) - Decimal(0.3) >= Decimal(actual_temperature()) and switch_state() == 'currently not running':
print('First heating loop ' + str(datetime.today()))
requests.get('http://192.168.1.167/4/on')
log_db('On', str(target_temperature()), str(actual_temperature()))
time.sleep(225)
requests.get('http://192.168.1.167/4/off')
log_db('Off', str(target_temperature()), str(actual_temperature()))
time.sleep(300)
cnt = cnt + 1
if(cnt != 1):
if Decimal(target_temperature()) - Decimal(0.3) >= Decimal(actual_temperature()) and switch_state() == 'currently not running':
print('Second heating loop ' + str(datetime.today()))
requests.get('http://192.168.1.167/4/on')
log_db('On', str(target_temperature()), str(actual_temperature()))
time.sleep(180)
requests.get('http://192.168.1.167/4/off')
log_db('Off', str(target_temperature()), str(actual_temperature()))
time.sleep(300)
except Exception as e:
print(e)
Bearing in mind i don't know anything about python i will try to help.
1 - The first thing i would do is put the whole program in a try catch block. That wasy if anything bad happens you should be told about it
try:
<all your code>
except Exception as e2:
print('The whole thing errored' + e2)
2 - The delays are in seconds? For testing i would change every sleep to (30) so you can see what is going on without getting too bored waiting, when you have it working then change the times back.
3 - I would add some more print('got here!') like where you have if(cnt == 1) add else print('first loop wasnt 1 it was ' + cnt)
4 - try and make the code easier for you to read, when it gets actually run it will be so optimized that it wont bear any relation to what you write. So write it in a way that is easiest for you
5 - You turn it on and then off, but if the off failed it would never be turned off, you should assume that it will go badly and that will be the day you get a big bill. Try and stop it if an error occurs by adding another check if actualTemp > targetTemp then turn it off?
6 - the http request might take ages, specify a time in seconds you are prepared to wait like , timeout=60
try:
while 1==1:
try:
time.sleep(30)
targetTemp = Decimal(target_temperature())
actualTemp = Decimal(actual_temperature())
switchState = switch_state()
print('Doing it at ' + str(datetime.now()) + ' target ' + str(targetTemp) + ' actual ' + str(actualTemp) + ' switch ' + switchState)
if targetTemp - Decimal(0.3) >= actualTemp and switchState == 'currently not running'
print('too cold turning it for a bit!')
requests.get('http://192.168.1.167/4/on', timeout=60)
log_db('On', targetTemp , actualTemp)
else if actualTemp > targetTemp and switchState != 'currently not running'
print('too hot turning it off!')
requests.get('http://192.168.1.167/4/off', timeout=60)
log_db('Off', targetTemp , actualTemp)
else
print('Not doing anything!')
except Exception as e1:
print('Loop errored -> ' + e1)
except Exception as e2:
print('Whole thing errored -> ' + e2)
Thanks Billy the Kid. You were right. Sometimes the devices that the loop uses via HTTPRequests just don't reapond (the two functions use HTTPRequests) and sometimes they create errors that aren't caught in the loop. Putting the whole thing in a try/catch oddly did identify that. Problem solved.

Depth First Search Implementation - understanding swift code

I was going thru few tutorials for Tree DS and I found this code which is really confusing to understand. Please explain
public func forEachDepthFirst(visit: (TreeNode) -> Void) {
visit(self) // 1
children.forEach { // 2
$0.forEachDepthFirst(visit: visit)
}
}
}
Why do we have visit(self) here?
I see explanation here https://forums.raywenderlich.com/t/help-understanding-the-recursion-for-depth-first-traversal/56552/2 but its still not clear
Any recusive method has
1- base case : which ends the run and here it's
children.forEach // when children property is empty meaning a leaf node
2- recusive case
$0.forEachDepthFirst(visit: visit) // call the same method with it's children
Your method takes a closure / completion that's be called for every node inside the main root node
So suppose You have root
0
- 1
- 1.1 , 1.2 , 1.3
- 2
- 2.1 , 2.2 , 2.3
Here 0 node is called then when runnign your function
visit(0)
children.forEach { // = 1,2
for 0 > 1
visit(1)
children.forEach { // = 1.1,1.2,1.3
for 0 > 2
visit(2)
children.forEach { // = 2.1,2.2,2.3
Inner case
for 0 > 1 > 1.1
visit(1.1)
children.forEach { // end here as there is no childrens ( leaf node)
so on for 1.2,1,3
for 0 > 2 > 2.1 / 2.2 / 2.3 same as above case
How to call
your method is an instance method inside the tree so every node can call it , if you want to traverse nodes of 0 then do this
zeroNode.forEachDepthFirst { (item) in
print(item.name) // suppose node object has a name
}
Then you will get
0 , 1 , 1.1 , 1.2 , 1.3 , 2.1 , 2.2 , 2.3
And that's as you called visit(NodeObject) for the main node and recursively all it's childrens
Why do we have visit(self) here?
Because if we didn't, we would never actually do anything to any of the nodes on the tree!
Consider this tree:
n1 -> n2 -> n3 -> n4
We now call our method forEachDepthFirst on n1. If we didn't have visit(self), we would immediately call forEachDepthFirst on n2, which would call it on n3, which would call it on n4. And then we'd stop. But at no time would we have called visit, so we would have looped through every node in the tree without doing anything to those nodes.

What is the purpose of Flux::sampleTimeout method in the project-reactor API?

The Java docs say the following:
Emit the last value from this Flux only if there were no new values emitted during the time window provided by a publisher for that particular last value.
However I found the above description confusing. I read in gitter chat that its similar to debounce in RxJava. Can someone please illustrate it with an example? I could not find this anywhere after doing a thorough search.
sampleTimeout lets you associate a companion Flux X' to each incoming value x in the source. If X' completes before the next value is emitted in the source, then value x is emitted. If not, x is dropped.
The same processing is applied to subsequent values.
Think of it as splitting the original sequence into windows delimited by the start and completion of each companion flux. If two windows overlap, the value that triggered the first one is dropped.
On the other side, you have sample(Duration) which only deals with a single companion Flux. It splits the sequence into windows that are contiguous, at a regular time period, and drops all but the last element emitted during a particular window.
(edit): about your use case
If I understand correctly, it looks like you have a processing of varying length that you want to schedule periodically, but you also don't want to consider values for which processing takes more than one period?
If so, it sounds like you want to 1) isolate your processing in its own thread using publishOn and 2) simply need sample(Duration) for the second part of the requirement (the delay allocated to a task is not changing).
Something like this:
List<Long> passed =
//regular scheduling:
Flux.interval(Duration.ofMillis(200))
//this is only to show that processing is indeed started regularly
.elapsed()
//this is to isolate the blocking processing
.publishOn(Schedulers.elastic())
//blocking processing itself
.map(tuple -> {
long l = tuple.getT2();
int sleep = l % 2 == 0 || l % 5 == 0 ? 100 : 210;
System.out.println(tuple.getT1() + "ms later - " + tuple.getT2() + ": sleeping for " + sleep + "ms");
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
e.printStackTrace();
}
return l;
})
//this is where we say "drop if too long"
.sample(Duration.ofMillis(200))
//the rest is to make it finite and print the processed values that passed
.take(10)
.collectList()
.block();
System.out.println(passed);
Which outputs:
205ms later - 0: sleeping for 100ms
201ms later - 1: sleeping for 210ms
200ms later - 2: sleeping for 100ms
199ms later - 3: sleeping for 210ms
201ms later - 4: sleeping for 100ms
200ms later - 5: sleeping for 100ms
201ms later - 6: sleeping for 100ms
196ms later - 7: sleeping for 210ms
204ms later - 8: sleeping for 100ms
198ms later - 9: sleeping for 210ms
201ms later - 10: sleeping for 100ms
196ms later - 11: sleeping for 210ms
200ms later - 12: sleeping for 100ms
202ms later - 13: sleeping for 210ms
202ms later - 14: sleeping for 100ms
200ms later - 15: sleeping for 100ms
[0, 2, 4, 5, 6, 8, 10, 12, 14, 15]
So the blocking processing is triggered approximately every 200ms, and only values that where processed within 200ms are kept.

Twitter's Future.collect not working concurrently (Scala)

Coming from a node.js background, I am new to Scala and I tried using Twitter's Future.collect to perform some simple concurrent operations. But my code shows sequential behavior rather than concurrent behavior. What am I doing wrong?
Here's my code,
import com.twitter.util.Future
def waitForSeconds(seconds: Int, container:String): Future[String] = Future[String] {
Thread.sleep(seconds*1000)
println(container + ": done waiting for " + seconds + " seconds")
container + " :done waiting for " + seconds + " seconds"
}
def mainFunction:String = {
val allTasks = Future.collect(Seq(waitForSeconds(1, "All"), waitForSeconds(3, "All"), waitForSeconds(2, "All")))
val singleTask = waitForSeconds(1, "Single")
allTasks onSuccess { res =>
println("All tasks succeeded with result " + res)
}
singleTask onSuccess { res =>
println("Single task succeeded with result " + res)
}
"Function Complete"
}
println(mainFunction)
and this is the output I get,
All: done waiting for 1 seconds
All: done waiting for 3 seconds
All: done waiting for 2 seconds
Single: done waiting for 1 seconds
All tasks succeeded with result ArraySeq(All :done waiting for 1 seconds, All :done waiting for 3 seconds, All :done waiting for 2 seconds)
Single task succeeded with result Single :done waiting for 1 seconds
Function Complete
The output I expect is,
All: done waiting for 1 seconds
Single: done waiting for 1 seconds
All: done waiting for 2 seconds
All: done waiting for 3 seconds
All tasks succeeded with result ArraySeq(All :done waiting for 1 seconds, All :done waiting for 3 seconds, All :done waiting for 2 seconds)
Single task succeeded with result Single :done waiting for 1 seconds
Function Complete
Twitter's futures are more explicit about where computations are executed than the Scala standard library futures. In particular, Future.apply will capture exceptions safely (like s.c.Future), but it doesn't say anything about which thread the computation will run in. In your case the computations are running in the main thread, which is why you're seeing the results you're seeing.
This approach has several advantages over the standard library's future API. For one thing it keeps method signatures simpler, since there's not an implicit ExecutionContext that has to be passed around everywhere. More importantly it makes it easier to avoid context switches (here's a classic explanation by Brian Degenhardt). In this respect Twitter's Future is more like Scalaz's Task, and has essentially the same performance benefits (described for example in this blog post).
The downside of being more explicit about where computations run is that you have to be more explicit about where computations run. In your case you could write something like this:
import com.twitter.util.{ Future, FuturePool }
val pool = FuturePool.unboundedPool
def waitForSeconds(seconds: Int, container:String): Future[String] = pool {
Thread.sleep(seconds*1000)
println(container + ": done waiting for " + seconds + " seconds")
container + " :done waiting for " + seconds + " seconds"
}
This won't produce exactly the output you're asking for ("Function complete" will be printed first, and allTasks and singleTask aren't sequenced with respect to each other), but it will run the tasks in parallel on separate threads.
(As a footnote: the FuturePool.unboundedPool in my example above is an easy way to create a future pool for a demo, and is often just fine, but it isn't appropriate for CPU-intensive computations—see the FuturePool API docs for other ways to create a future pool that will use an ExecutorService that you provide and can manage yourself.)

lock an operator chain while an observable item is passing through it

I have an Observable source and an operator chain that transforms the source into a target type. Generally for each source item, up to one target is produced.
Source -> Operator chain -> Target
Operator logic is kind of complex and involves more than one async database calls using IO scheduler. I omit the details here as it does not seem to be relevant.
What I see is that new Observables are coming from the Source while a previous observable is still being processed by the chain. So it resembles some sort of pipeline. This is probably a good thing in many cases but not in mine.
So I am looking a way on how to delay the source items to coming into the chain (effectively locking it) until a previous item reaches the Target. Is there any known pattern of doing this?
One ugly solution I see is to use something like this at the beginning of the chain:
zip(source, signal, (source, signal)->source)
where signal is a custom observable for pushing a notification into every time the chain is ready to accept a new source item (one notification initially and when an item being processed reaches the end of the chain)
But I find it a bit hacky. Can it be achieved more gracefully or using set of standard operators?
Here is the synthetic example to reproduce the behavior I do not want.
Source is 100ms interval timer.
Operator chain is a slow (10x slower than source) async call that computes a square on Schedulers.io()
Target item is effectively a source squared.
Subscription s = Observable.timer(100, 100, TimeUnit.MILLISECONDS)
.doOnNext(source->System.out.println("source: " + source))
.concatMap(source->Observable.create(subscr->{
Schedulers.io().createWorker().schedule(()->{
subscr.onNext(source * source);
subscr.onCompleted();
}, 1000, TimeUnit.MILLISECONDS);
}))
.doOnNext(target->System.out.println("target: " + target))
.subscribe();
Thread.sleep(10000);
s.unsubscribe();
Both source and target are printed out:
source: 0
source: 1
source: 2
source: 3
source: 4
source: 5
source: 6
source: 7
source: 8
source: 9
source: 10
source: 11
target: 0
source: 12
source: 13
source: 14
source: 15
source: 16
source: 17
source: 18
source: 19
source: 20
target: 1
source: 21
source: 22
source: 23
source: 24
source: 25
source: 26
source: 27
source: 28
source: 29
source: 30
source: 31
target: 4
source: 32
source: 33
But what I would like to achieve is:
source: 0
target: 0
source: 1
target: 1
source: 2
target: 4
...
Depending on your source type, this can be achieved with flatMap parametrized to have maxConcurrency = 1:
Observable.interval(100, 100, TimeUnit.MILLISECONDS)
.onBackpressureBuffer()
.doOnNext(source -> System.out.println("source: " + source))
.flatMap(source ->
Observable.just(source)
.map(v -> v * v)
.delay(1, TimeUnit.SECONDS), 1)
.doOnNext(target->System.out.println("target: " + target))
.subscribe();
Thread.sleep(10000);
This solution involves buffering but if the source is hot, you might want to choose a different backpressure strategy.
Not strictly related to the requirements but I'd like to point out that this pattern of yours:
Schedulers.io().createWorker().schedule(()->{
subscr.onNext(source * source);
subscr.onCompleted();
}, 1000, TimeUnit.MILLISECONDS);
leaks the worker and will fill up your system with non-reusable threads. If you really want to delay events via a ˙Worker`, you should capture and unsubscribe the worker instance:
Scheduler.Worker w = Schedulers.io().createWorker();
subscr.add(w);
w.schedule(() -> {
try {
subscr.onNext(source * source);
subscr.onCompleted();
} finally {
w.unsubscribe();
}
}, 1000, TimeUnit.MILLISECONDS);