In Rx (or RxJava/RxScala), how to make an auto-resetting stateful latch map/filter for measuring in-stream elapsed time to touch a barrier?

In Rx (or RxJava/RxScala), how to make an auto-resetting stateful latch map/filter for measuring in-stream elapsed time to touch a barrier? - scala

Apologies if the question is poorly phrased, I'll do my best.
If I have a sequence of values with times as an Observable[(U,T)] where U is a value and T is a time-like type (or anything difference-able I suppose), how could I write an operator which is an auto-reset one-touch barrier, which is silent when abs(u_n - u_reset) < barrier, but spits out t_n - t_reset if the barrier is touched, at which point it also resets u_reset = u_n.
That is to say, the first value this operator receives becomes the baseline, and it emits nothing. Henceforth it monitors the values of the stream, and as soon as one of them is beyond the baseline value (above or below), it emits the elapsed time (measured by the timestamps of the events), and resets the baseline. These times then will be processed to form a high-frequency estimate of the volatility.
For reference, I am trying to write a volatility estimator outlined in http://www.amazon.com/Volatility-Trading-CD-ROM-Wiley/dp/0470181990 , where rather than measuring the standard deviation (deviations at regular homogeneous times), you repeatedly measure the time taken to breach a barrier for some fixed barrier amount.
Specifically, could this be written using existing operators? I'm a bit stuck on how the state would be reset, though maybe I need to make two nested operators, one which is one-shot and another which keeps creating that one-shot... I know it could be done by writing one by hand, but then I need to write my own publisher etc etc.
Thanks!

I don't fully understand the algorithm and your variables in the example, but you can use flatMap with some heap-state and return empty() or just() as needed:
int[] var1 = { 0 };
source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
If you need a per-sequence state because of multiple consumers, you can defer the whole thing:
Observable.defer(() -> {
int[] var1 = { 0 };
return source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
}).subscribe(...);

Related

Transferring arrays/classes/records between locales

In a typical N-Body simulation, at the end of each epoch, each locale would need to share its own portion of the world (i.e. all bodies) to the rest of the locales. I am working on this with a local-view approach (i.e. using on Loc statements). I encountered some strange behaviours that I couldn't make sense out of, so I decided to make a test program, in which things got more complicated. Here's the code to replicate the experiment.
proc log(args...?n) {
writeln("[locale = ", here.id, "] [", datetime.now(), "] => ", args);
}
const max: int = 50000;
record stuff {
var x1: int;
var x2: int;
proc init() {
this.x1 = here.id;
this.x2 = here.id;
}
}
class ctuff {
var x1: int;
var x2: int;
proc init() {
this.x1 = here.id;
this.x2 = here.id;
}
}
class wrapper {
// The point is that total size (in bytes) of data in `r`, `c` and `a` are the same here, because the record and the class hold two ints per index.
var r: [{1..max / 2}] stuff;
var c: [{1..max / 2}] owned ctuff?;
var a: [{1..max}] int;
proc init() {
this.a = here.id;
}
}
proc test() {
var wrappers: [LocaleSpace] owned wrapper?;
coforall loc in LocaleSpace {
on Locales[loc] {
wrappers[loc] = new owned wrapper();
}
}
// rest of the experiment further down.
}
Two interesting behaviours happen here.
1. Moving data
Now, each instance of wrapper in array wrappers should live in its locale. Specifically, the references (wrappers) will live in locale 0, but the internal data (r, c, a) should live in the respective locale. So we try to move some from locale 1 to locale 3, as such:
on Locales[3] {
var timer: Timer;
timer.start();
var local_stuff = wrappers[1]!.r;
timer.stop();
log("get r from 1", timer.elapsed());
log(local_stuff);
}
on Locales[3] {
var timer: Timer;
timer.start();
var local_c = wrappers[1]!.c;
timer.stop();
log("get c from 1", timer.elapsed());
}
on Locales[3] {
var timer: Timer;
timer.start();
var local_a = wrappers[1]!.a;
timer.stop();
log("get a from 1", timer.elapsed());
}
Surprisingly, my timings show that
Regardless of the size (const max), the time of sending the array and record strays constant, which doesn't make sense to me. I even checked with chplvis, and the size of GET actually increases, but the time stays the same.
The time to send the class field increases with time, which makes sense, but it is quite slow and I don't know which case to trust here.
2. Querying the locales directly.
To demystify the problem, I also query the .locale.id of some variables directly. First, we query the data, which we expect to live in locale 2, from locale 2:
on Locales[2] {
var wrappers_ref = wrappers[2]!; // This is always 1 GET from 0, okay.
log("array",
wrappers_ref.a.locale.id,
wrappers_ref.a[1].locale.id
);
log("record",
wrappers_ref.r.locale.id,
wrappers_ref.r[1].locale.id,
wrappers_ref.r[1].x1.locale.id,
);
log("class",
wrappers_ref.c.locale.id,
wrappers_ref.c[1]!.locale.id,
wrappers_ref.c[1]!.x1.locale.id
);
}
And the result is:
[locale = 2] [2020-12-26T19:36:26.834472] => (array, 2, 2)
[locale = 2] [2020-12-26T19:36:26.894779] => (record, 2, 2, 2)
[locale = 2] [2020-12-26T19:36:27.023112] => (class, 2, 2, 2)
Which is expected. Yet, if we query the locale of the same data on locale 1, then we get:
[locale = 1] [2020-12-26T19:34:28.509624] => (array, 2, 2)
[locale = 1] [2020-12-26T19:34:28.574125] => (record, 2, 2, 1)
[locale = 1] [2020-12-26T19:34:28.700481] => (class, 2, 2, 2)
Implying that wrappers_ref.r[1].x1.locale.id lives in locale 1, even though it should clearly be on locale 2. My only guess is that by the time .locale.id is executed, the data (i.e. the .x of the record) is already moved to the querying locale (1).
So all in all, the second part of the experiment lead to a secondary question, whilst not answering the first part.
NOTE: all experiment are run with -nl 4 in chapel/chapel-gasnet docker image.

Good observations, let me see if I can shed some light.
As an initial note, any timings taken with the gasnet Docker image should be taken with a grain of salt since that image simulates the execution across multiple nodes using your local system rather than running each locale on its own compute node as intended in Chapel. As a result, it is useful for developing distributed memory programs, but the performance characteristics are likely to be very different than running on an actual cluster or supercomputer. That said, it can still be useful for getting coarse timings (e.g., your "this is taking a much longer time" observation) or for counting communications using chplvis or the CommDiagnostics module.
With respect to your observations about timings, I also observe that the array-of-class case is much slower, and I believe I can explain some of the behaviors:
First, it's important to understand that any cross-node communications can be characterized using a formula like alpha + beta*length. Think of alpha as representing the basic cost of performing the communication, independent of length. This represents the cost of calling down through the software stack to get to the network, putting the data on the wire, receiving it on the other side, and getting it back up through the software stack to the application there. The precise value of alpha will depend on factors like the type of communication, choice of software stack, and physical hardware. Meanwhile, think of beta as representing the per-byte cost of the communication where, as you intuit, longer messages necessarily cost more because there's more data to put on the wire, or potentially to buffer or copy, depending on how the communication is implemented.
In my experience, the value of alpha typically dominates beta for most system configurations. That's not to say that it's free to do longer data transfers, but that the variance in execution time tends to be much smaller for longer vs. shorter transfers than it is for performing a single transfer versus many. As a result, when choosing between performing one transfer of n elements vs. n transfers of 1 element, you'll almost always want the former.
To investigate your timings, I bracketed your timed code portions with calls to the CommDiagnostics module as follows:
resetCommDiagnostics();
startCommDiagnostics();
...code to time here...
stopCommDiagnostics();
printCommDiagnosticsTable();
and found, as you did with chplvis, that the number of communications required to localize the array of records or array of ints was constant as I varied max, for example:
locale
get
execute_on
0
0
0
1
0
0
2
0
0
3
21
1
This is consistent with what I'd expect from the implementation: That for an array of value types, we perform a fixed number of communications to access array meta-data, and then communicate the array elements themselves in a single data transfer to amortize the overheads (avoid paying multiple alpha costs).
In contrast, I found that the number of communications for localizing the array of classes was proportional to the size of the array. For example, for the default value of 50,000 for max, I saw:
locale
get
put
execute_on
0
0
0
0
1
0
0
0
2
0
0
0
3
25040
25000
1
I believe the reason for this distinction relates to the fact that c is an array of owned classes, in which only a single class variable can "own" a given ctuff object at a time. As a result, when copying the elements of array c from one locale to another, you're not just copying raw data, as with the record and integer cases, but also performing an ownership transfer per element. This essentially requires setting the remote value to nil after copying its value to the local class variable. In our current implementation, this seems to be done using a remote get to copy the remote class value to the local one, followed by a remote put to set the remote value to nil, hence, we have a get and put per array element, resulting in O(n) communications rather than O(1) as in the previous cases. With additional effort, we could potentially have the compiler optimize this case, though I believe it will always be more expensive than the others due to the need to perform the ownership transfer.
I tested the hypothesis that owned classes were resulting in the additional overhead by changing your ctuff objects from being owned to unmanaged, which removes any ownership semantics from the implementation. When I do this, I see a constant number of communications, as in the value cases:
locale
get
execute_on
0
0
0
1
0
0
2
0
0
3
21
1
I believe this represents the fact that once the language has no need to manage the ownership of the class variables, it can simply transfer their pointer values in a single transfer again.
Beyond these performance notes, it's important to understand a key semantic difference between classes and records when choosing which to use. A class object is allocated on the heap, and a class variable is essentially a reference or pointer to that object. Thus, when a class variable is copied from one locale to another, only the pointer is copied, and the original object remains where it was (for better or worse). In contrast, a record variable represents the object itself, and can be thought of as being allocated "in place" (e.g., on the stack for a local variable). When a record variable is copied from one locale to the other, it's the object itself (i.e., the record's fields' values) which are copied, resulting in a new copy of the object itself. See this SO question for further details.
Moving on to your second observation, I believe that your interpretation is correct, and that this may be a bug in the implementation (I need to stew on it a bit more to be confident). Specifically, I think you're correct that what's happening is that wrappers_ref.r[1].x1 is being evaluated, with the result being stored in a local variable, and that the .locale.id query is being applied to the local variable storing the result rather than the original field. I tested this theory by taking a ref to the field and then printing locale.id of that ref, as follows:
ref x1loc = wrappers_ref.r[1].x1;
...wrappers_ref.c[1]!.x1.locale.id...
and that seemed to give the right result. I also looked at the generated code which seemed to indicate that our theories were correct. I don't believe that the implementation should behave this way, but need to think about it a bit more before being confident. If you'd like to open a bug against this on Chapel's GitHub issues page, for further discussion there, we'd appreciate that.

"Appending" to an ArraySlice?

Say ...
you have about 20 Thing
very often, you do a complex calculation running through a loop of say 1000 items. The end result is a varying number around 20 each time
you don't know how many there will be until you run through the whole loop
you then want to quickly (and of course elegantly!) access the result set in many places
for performance reasons you don't want to just make a new array each time. note that unfortunately there's a differing amount so you can't just reuse the same array trivially.
What about ...
var thingsBacking = [Thing](repeating: Thing(), count: 100) // hard limit!
var things: ArraySlice<Thing> = []
func fatCalculation() {
var pin: Int = 0
// happily, no need to clean-out thingsBacking
for c in .. some huge loop {
... only some of the items (roughly 20 say) become the result
x = .. one of the result items
thingsBacking[pin] = Thing(... x, y, z )
pin += 1
}
// and then, magic of slices ...
things = thingsBacking[0..<pin]
(Then, you can do this anywhere... for t in things { .. } )
What I am wondering, is there a way you can call to an ArraySlice<Thing> to do that in one step - to "append to" an ArraySlice and avoid having to bother setting the length at the end?
So, something like this ..
things = ... set it to zero length
things.quasiAppend(x)
things.quasiAppend(x2)
things.quasiAppend(x3)
With no further effort, things now has a length of three and indeed the three items are already in the backing array.
I'm particularly interested in performance here (unusually!)
Another approach,
var thingsBacking = [Thing?](repeating: Thing(), count: 100) // hard limit!
and just set the first one after your data to nil as an end-marker. Again, you don't have to waste time zeroing. But the end marker is a nuisance.
Is there a more better way to solve this particular type of array-performance problem?

Based on MartinR's comments, it would seem that for the problem
the data points are incoming and
you don't know how many there will be until the last one (always less than a limit) and
you're having to redo the whole thing at high Hz
It would seem to be best to just:
(1) set up the array
var ra = [Thing](repeating: Thing(), count: 100) // hard limit!
(2) at the start of each run,
.removeAll(keepingCapacity: true)
(3) just go ahead and .append each one.
(4) you don't have to especially mark the end or set a length once finished.
It seems it will indeed then use the same array backing. And it of course "increases the length" as it were each time you append - and you can iterate happily at any time.
Slices - get lost!

Merge Sort algorithm efficiency

I am currently taking an online algorithms course in which the teacher doesn't give code to solve the algorithm, but rather rough pseudo code. So before taking to the internet for the answer, I decided to take a stab at it myself.
In this case, the algorithm that we were looking at is merge sort algorithm. After being given the pseudo code we also dove into analyzing the algorithm for run times against n number of items in an array. After a quick analysis, the teacher arrived at 6nlog(base2)(n) + 6n as an approximate run time for the algorithm.
The pseudo code given was for the merge portion of the algorithm only and was given as follows:
C = output [length = n]
A = 1st sorted array [n/2]
B = 2nd sorted array [n/2]
i = 1
j = 1
for k = 1 to n
if A(i) < B(j)
C(k) = A(i)
i++
else [B(j) < A(i)]
C(k) = B(j)
j++
end
end
He basically did a breakdown of the above taking 4n+2 (2 for the declarations i and j, and 4 for the number of operations performed -- the for, if, array position assignment, and iteration). He simplified this, I believe for the sake of the class, to 6n.
This all makes sense to me, my question arises from the implementation that I am performing and how it effects the algorithms and some of the tradeoffs/inefficiencies it may add.
Below is my code in swift using a playground:
func mergeSort<T:Comparable>(_ array:[T]) -> [T] {
guard array.count > 1 else { return array }
let lowerHalfArray = array[0..<array.count / 2]
let upperHalfArray = array[array.count / 2..<array.count]
let lowerSortedArray = mergeSort(array: Array(lowerHalfArray))
let upperSortedArray = mergeSort(array: Array(upperHalfArray))
return merge(lhs:lowerSortedArray, rhs:upperSortedArray)
}
func merge<T:Comparable>(lhs:[T], rhs:[T]) -> [T] {
guard lhs.count > 0 else { return rhs }
guard rhs.count > 0 else { return lhs }
var i = 0
var j = 0
var mergedArray = [T]()
let loopCount = (lhs.count + rhs.count)
for _ in 0..<loopCount {
if j == rhs.count || (i < lhs.count && lhs[i] < rhs[j]) {
mergedArray.append(lhs[i])
i += 1
} else {
mergedArray.append(rhs[j])
j += 1
}
}
return mergedArray
}
let values = [5,4,8,7,6,3,1,2,9]
let sortedValues = mergeSort(values)
My questions for this are as follows:
Do the guard statements at the start of the merge<T:Comparable> function actually make it more inefficient? Considering we are always halving the array, the only time that it will hold true is for the base case and when there is an odd number of items in the array.
This to me seems like it would actually add more processing and give minimal return since the time that it happens is when we have halved the array to the point where one has no items.
Concerning my if statement in the merge. Since it is checking more than one condition, does this effect the overall efficiency of the algorithm that I have written? If so, the effects to me seems like they vary based on when it would break out of the if statement (e.g at the first condition or the second).
Is this something that is considered heavily when analyzing algorithms, and if so how do you account for the variance when it breaks out from the algorithm?
Any other analysis/tips you can give me on what I have written would be greatly appreciated.

You will very soon learn about Big-O and Big-Theta where you don't care about exact runtimes (believe me when I say very soon, like in a lecture or two). Until then, this is what you need to know:
Yes, the guards take some time, but it is the same amount of time in every iteration. So if each iteration takes X amount of time without the guard and you do n function calls, then it takes X*n amount of time in total. Now add in the guards who take Y amount of time in each call. You now need (X+Y)*n time in total. This is a constant factor, and when n becomes very large the (X+Y) factor becomes negligible compared to the n factor. That is, if you can reduce a function X*n to (X+Y)*(log n) then it is worthwhile to add the Y amount of work because you do fewer iterations in total.
The same reasoning applies to your second question. Yes, checking "if X or Y" takes more time than checking "if X" but it is a constant factor. The extra time does not vary with the size of n.
In some languages you only check the second condition if the first fails. How do we account for that? The simplest solution is to realize that the upper bound of the number of comparisons will be 3, while the number of iterations can be potentially millions with a large n. But 3 is a constant number, so it adds at most a constant amount of work per iteration. You can go into nitty-gritty details and try to reason about the distribution of how often the first, second and third condition will be true or false, but often you don't really want to go down that road. Pretend that you always do all the comparisons.
So yes, adding the guards might be bad for your runtime if you do the same number of iterations as before. But sometimes adding extra work in each iteration can decrease the number of iterations needed.

Rx debouncing inputs

I need to debounce an input-stream.
At the first occurrence of state 1 I need to wait for 5 Seconds and verify if the laste state was also 1.
Only than I have a stable signal.
(time) 0-1-2-3-4-5-6-7-8-9
(state) 0-0-0-0-0-1-0-1-0-1
(result) -> 1
Here is an example of a non-stable signal.
(time) 0-1-2-3-4-5-6-7-8-9
(state) 0-0-0-0-0-1-0-1-0-0
(result) -> 0
I tried using a buffer, but a buffer has fixed starting point and I need to wait for 5 seconds starting with my first event.

Taking your requirements literally
At the first occurrence of state 1 I need to wait for 5 Seconds and
verify if the laste state was also 1. Only than I have a stable
signal.
I can come up with a few ways to solve this problem.
To clarify my assumptions, you just want to push the last value produced 5 seconds after the first occurrence of a 1. This will result in a single value sequence producing either a 0 or a 1 (ie. regardless of any further values produced past 5 seconds from the source sequence)
Here I recreate you sequence with some jiggery-pokery.
var source = Observable.Timer(TimeSpan.Zero,TimeSpan.FromSeconds(1))
.Take(10)
.Select(i=>{if(i==5 || i==7 || i==9){return 1;}else{return 0;}}); //Should produce 1;
//.Select(i=>{if(i==5 || i==7 ){return 1;}else{return 0;}}); //Should produce 0;
All of the options below look to share the sequence. To share a sequence safely in Rx we Publish() and connect it. I use automatic connecting via the RefCount() operator.
var sharedSource = source.Publish().RefCount();
1) In this solution we take the first value of 1, and then buffer the selected the values of the sequence in to buffer sizes of 5 seconds. We only take the first of these buffers. Once we get this buffer, we get the last value and push that. If the buffer is empty, I assume we push a one as the last value was the '1' that started the buffer from running.
sharedSource.Where(state=>state==1)
.Take(1)
.SelectMany(_=>sharedSource.Buffer(TimeSpan.FromSeconds(5)).Take(1))
.Select(buffer=>
{
if(buffer.Any())
{
return buffer.Last();
}
else{
return 1;
}
})
.Dump();
2) In this solution I take the approach to only start listening once we get a valid value (1) and then take all values until a timer triggers the termination. From here we take the last value produced.
var fromFirstValid = sharedSource.SkipWhile(state=>state==0);
fromFirstValid
.TakeUntil(
fromFirstValid.Take(1)
.SelectMany(_=>Observable.Timer(TimeSpan.FromSeconds(5))))
.TakeLast(1)
.Dump();
3) In this solution I use the window operator to create a single window that opens when the first value of '1' happens and then closes when 5 seconds elapses. Again we just take the last value
sharedSource.Window(
sharedSource.Where(state=>state==1),
_=>Observable.Timer(TimeSpan.FromSeconds(5)))
.SelectMany(window=>window.TakeLast(1))
.Take(1)
.Dump();
So lots of different ways to skin-a-cat.

It sounds (at a glance) like you want Throttle, not Buffer, although some more information on your use cases would help pin that down - at any rate, here's how you might Throttle your stream:
void Main()
{
var subject = new Subject<int>();
var source = subject.Publish().RefCount();
var query = source
// Start counting on a 1, wait 5 seconds, and take the last value
.Throttle(x => Observable.Timer(TimeSpan.FromSeconds(5)));
using(query.Subscribe(Console.WriteLine))
{
// This sequence should produce a one
subject.OnNext(1);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(1);
Console.ReadLine();
// This sequence should produce a zero
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(0);
Console.ReadLine();
}
}

Which costs more while looping; assignment or an if-statement?

Consider the following 2 scenarios:
boolean b = false;
int i = 0;
while(i++ < 5) {
b = true;
}
OR
boolean b = false;
int i = 0;
while(i++ < 5) {
if(!b) {
b = true;
}
}
Which is more "costly" to do? If the answer depends on used language/compiler, please provide. My main programming language is Java.
Please do not ask questions like why would I want to do either.. They're just barebone examples that point out the relevant: should a variable be set the same value in a loop over and over again or should it be tested on every loop that it holds a value needed to change?

Please do not forget the rules of Optimization Club.
The first rule of Optimization Club is, you do not Optimize.
The second rule of Optimization Club is, you do not Optimize without measuring.
If your app is running faster than the underlying transport protocol, the optimization is over.
One factor at a time.
No marketroids, no marketroid schedules.
Testing will go on as long as it has to.
If this is your first night at Optimization Club, you have to write a test case.
It seems that you have broken rule 2. You have no measurement. If you really want to know, you'll answer the question yourself by setting up a test that runs scenario A against scenario B and finds the answer. There are so many differences between different environments, we can't answer.

Have you tested this? Working on a Linux system, I put your first example in a file called LoopTestNoIf.java and your second in a file called LoopTestWithIf.java, wrapped a main function and class around each of them, compiled, and then ran with this bash script:
#!/bin/bash
function run_test {
iter=0
while [ $iter -lt 100 ]
do
java $1
let iter=iter+1
done
}
time run_test LoopTestNoIf
time run_test LoopTestWithIf
The results were:
real 0m10.358s
user 0m4.349s
sys 0m1.159s
real 0m10.339s
user 0m4.299s
sys 0m1.178s
Showing that having the if makes it slight faster on my system.

Are you trying to find out if doing the assignment each loop is faster in total run time than doing a check each loop and only assigning once on satisfaction of the test condition?
In the above example I would guess that the first is faster. You perform 5 assignments. In the latter you perform 5 test and then an assignment.
But you'll need to up the iteration count and throw in some stopwatch timers to know for sure.

Actually, this is the question I was interested in… (I hoped that I’ll find the answer somewhere to avoid own testing. Well, I didn’t…)
To be sure that your (mine) test is valid, you (I) have to do enough iterations to get enough data. Each iteration must be “long” enough (I mean the time scale) to show the true difference. I’ve found out that even one billion iterations are not enough to fit to time interval that would be long enough… So I wrote this test:
for (int k = 0; k < 1000; ++k)
{
{
long stopwatch = System.nanoTime();
boolean b = false;
int i = 0, j = 0;
while (i++ < 1000000)
while (j++ < 1000000)
{
int a = i * j; // to slow down a bit
b = true;
a /= 2; // to slow down a bit more
}
long time = System.nanoTime() - stopwatch;
System.out.println("\\tasgn\t" + time);
}
{
long stopwatch = System.nanoTime();
boolean b = false;
int i = 0, j = 0;
while (i++ < 1000000)
while (j++ < 1000000)
{
int a = i * j; // the same thing as above
if (!b)
{
b = true;
}
a /= 2;
}
long time = System.nanoTime() - stopwatch;
System.out.println("\\tif\t" + time);
}
}
I ran the test three times storing the data in Excel, then I swapped the first (‘asgn’) and second (‘if’) case and ran it three times again… And the result? Four times “won” the ‘if’ case and two times the ‘asgn’ appeared to be the better case. This shows how sensitive the execution might be. But in general, I hope that this has also proven that the ‘if’ case is better choice.
Thanks, anyway…

Any compiler (except, perhaps, in debug) will optimize both these statements to
bool b = true;
But generally, relative speed of assignment and branch depend on processor architecture, and not on compiler. A modern, super-scalar processor perform horribly on branches. A simple micro-controller uses roughly the same number of cycles per any instruction.

Relative to your barebones example (and perhaps your real application):
boolean b = false;
// .. other stuff, might change b
int i = 0;
// .. other stuff, might change i
b |= i < 5;
while(i++ < 5) {
// .. stuff with i, possibly stuff with b, but no assignment to b
}
problem solved?
But really - it's going to be a question of the cost of your test (generally more than just if (boolean)) and the cost of your assignment (generally more than just primitive = x). If the test/assignment is expensive or your loop is long enough or you have high enough performance demands, you might want to break it into two parts - but all of those criteria require that you test how things perform. Of course, if your requirements are more demanding (say, b can flip back and forth), you might require a more complex solution.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse