Rx Subscribe issue, how to update data in UI control in winform - system.reactive

I would like to do a little project to do some calculation and add the calculated results in listbox.
My code:
int SumLoop(int lowLimit, int highLimit)
{
int idx;
int totalSum = 0;
for (idx = lowLimit; idx <= highLimit; idx = idx + 1)
{
totalSum += idx;
}
return totalSum;
}
private void button1_Click(object sender, EventArgs e)
{
var test2 = Observable.Interval(TimeSpan.FromMilliseconds(1000)).Select(x=>(int)x).Take(10);
test2.Subscribe(n =>
{
this.BeginInvoke(new Action(() =>
{
listBox1.Items.Add("input:" + n);
listBox1.Items.Add("result:" + SumLoop(n,99900000));
}));
});
}
The result:
input:0
result:376307504
(stop a while)
input:1
result:376307504
(stop a while)
input:2
result:376307503
(stop a while)
input:3
result:376307501
(stop a while)
....
...
..
.
input:"9
result:376307468
If i would like to modify the interval constant from 1000 --> 10,
var test2 = Observable.Interval(TimeSpan.FromMilliseconds(10)).Select(x=>(int)x).Take(10);
The displaying behavior becomes different. The listbox will display all inputs and results just a shot. It seems that it waits all results to complete and then display everything to listbox. Why?
If i would like to keep using this constant (interval:10) and dont want to display everything just a shot. I want to display "Input :0" -->wait for calculation-->display "result:376307504"....
So, how can i do this?
Thankx for your help.

If I understand you correctly you're wanting to run the sum loop off the UI thread, here's how you would do that:
Observable
.Interval(TimeSpan.FromMilliseconds(1000))
.Select(x => (int)x)
.Select(x => SumLoop(x, 99900000))
.Take(10)
.ObserveOn(listBox1) // or ObserveOnDispatcher() if you're using WPF
.Subscribe(r => {
listBox1.Items.Add("result:" + r);
});
You should see the results trickle in on an interval of 10ms + ~500ms.

Instead of doing control.Invoke/control.BeginInvoke, you'll want to call .ObserveOnDispatcher() to get your action invoked on the UI thread:
Observable
.Interval(TimeSpan.FromMilliseconds(1000))
.Select(x=>(int)x)
.Take(10)
.Subscribe(x => {
listBox1.Items.Add("input:" + x);
listBox1.Items.Add("result:" + SumLoop(x, 99900000));
});
You said that if you change the interval from 1000 ms to 10ms, you observe different behavior.
The listbox will display all inputs and results just a shot.
I suspect this is because 10ms is so fast, all the actions you're executing are queued up. The UI thread comes around to execute them, and wham, executes everything that's queued.
In contrast, posting them every 1000ms (one second) allows the UI thread to execute one, rest, execute another one, rest, etc.

Related

How to create basic pagination logic in Gatling?

So I am trying to create basic pagination in Gatling but failing miserably.
My current scenario is as follows:
1) First call is always a POST request, the body includes page index 1 and page size 50
{"pageIndex":1,"pageSize":50}
I am then receiving 50 object + the total count of objects on the environment:
"totalCount":232
Since I need to iterate through all objects on the environment, I will need to POST this call 5 time, each time with an updated pageIndex.
My current (failing) code looks like:
def getAndPaginate(jsonBody: String) = {
val pageSize = 50;
var totalCount: Int = 0
var currentPage: Int = 1
var totalPages: Int =0
exec(session => session.set("pageIndex", currentPage))
exec(http("Get page")
.post("/api")
.body(ElFileBody("json/" + jsonBody)).asJson
.check(jsonPath("$.result.objects[?(#.type == 'type')].id").findAll.saveAs("list")))
.check(jsonPath("$.result.totalCount").saveAs("totalCount"))
.exec(session => {
totalCount = session("totalCount").as[Int]
totalPages = Math.ceil(totalCount/pageSize).toInt
session})
.asLongAs(currentPage <= totalPages)
{
exec(http("Get assets action list")
.post("/api")
.body(ElFileBody("json/" + jsonBody)).asJson
.check(jsonPath("$.result.objects[?(#.type == 'type')].id").findAll.saveAs("list")))
currentPage = currentPage+1
exec(session => session.set("pageIndex", currentPage))
pause(Config.minDelayValue seconds, Config.maxDelayValue seconds)
}
}
Currently the pagination values are not assign to the variables that I have created at the beginning of the function, if I create the variables at the Object level then they are assigned but in a manner which I dont understand. For example the result of Math.ceil(totalCount/pageSize).toInt is 4 while it should be 5. (It is 5 if I execute it in the immediate window.... I dont get it ). I would than expect asLongAs(currentPage <= totalPages) to repeat 5 times but it only repeats twice.
I tried to create the function in a class rather than an Object because as far as I understand there is only one Object. (To prevent multiple users accessing the same variable I also ran only one user with the same result)
I am obviously missing something basic here (new to Gatling and Scala) so any help would be highly appreciated :)
using regular scala variables to hold the values isn't going to work - the gatling DSL defines builders that are only executed once at startup, so lines like
.asLongAs(currentPage <= totalPages)
will only ever execute with the initial values.
So you just need to handle everything using session variables
def getAndPaginate(jsonBody: String) = {
val pageSize = 50;
exec(session => session.set("notDone", true))
.asLongAs("${notDone}", "index") {
exec(http("Get assets action list")
.post("/api")
.body(ElFileBody("json/" + jsonBody)).asJson
.check(
jsonPath("$.result.totalCount")
//transform lets us take the result of a check (and an optional session) and modify it before storing - so we can use it to get store a boolean that reflects whether we're on the last page
.transform( (totalCount, session) => ((session("index").as[Int] + 1) * pageSize) < totalCount.toInt)
.saveAs("notDone")
)
)
.pause(Config.minDelayValue seconds, Config.maxDelayValue seconds)
}
}

RxJs: request list from server, consume values, re-request when we're almost out of values

I'm fetching a list of items from a REST api. The user interacts with each one via a click, and when there are only, say, a couple left unused, I'd like to repeat the request to get more items. I'm trying to do this using a proper RxJs (5) stream-oriented approach.
So, something like:
var userClick$ = Observable.fromEvent(button.nativeElement, 'click');
var needToExtend$ = new BehaviorSubject(1);
var list$ = needToExtend$
.flatMap( () => this.http.get("http://myserver/get-list") )
.flatMap( x => x['list'] );
var itemsUsed$ = userClick$.zip(list$, (click, item) => item);
itemsUsed$.subscribe( item => use(item) );
and then, to trigger a re-load when necessary:
list$.subscribe(
if (list$.isEmpty()) {
needToExtend$.next(1);
}
)
This last bit is wrong, and manually re-triggering doesn't seem very "stream-oriented" even if it did work as intended. Any ideas?
This is similar to Rxjs - Consume API output and re-query when cache is empty but I can't make assumptions about the length of the list returned by the API, and I'd like to re-request before the list is completely consumed. And the solution there feels a bit too clever. There must be a more readable way, right?
How about something like this:
const LIST_LIMIT = 3;
userClick$ = Observable.fromEvent(button.nativeElement, 'click');
list$ = this.http.get("http://myserver/get-list").map(r => r.list);
clickCounter$ = this.userClick$.scan((acc: number, val) => acc + 1, 0);
getList$ = new BehaviorSubject([]);
this.getList$
.switchMap(previousList => this.list$)
.switchMap(list => this.clickCounter$, (list, clickCount) => { return {list, clickCount}; })
.filter(({list, clickCount}) => clickCount >= list.length - LIST_LIMIT)
.map(({list, clickCount}) => list)
.subscribe(this.getList$);
The logic here if you define a list getter stream, and a signal to trigger it.
First, the signal causes switchMap to fetch a new list, which is then fed into another switchmap that resubscribes to a click counter. You combine the result of both streams and feed that to filter, which only emits when the click count is greater than or equal to the list length minus 3 (or whatever you want). Then the signal is subscribed to this whole stream so that it retriggers itself.
Edit: the biggest weakness of this is that you need to set the list value (for display) in a side effect rather than in subscription or with the async pipe. You can rearrange it and multicast though:
const LIST_LIMIT = 3;
userClick$ = Observable.fromEvent(button.nativeElement, 'click');
list$ = this.http.get("http://myserver/get-list").map(r => r.list);
clickCounter$: Observable<number> = this.userClick$.scan((acc: number, val) => acc + 1, 0).startWith(0);
getList$ = new BehaviorSubject([]);
refresh$ = this.getList$
.switchMap(list => this.clickCounter$
.filter(clickCount => list.length <= clickCount + LIST_LIMIT)
.first(),
(list, clickCount) => list)
.switchMap(previousList => this.list$)
.multicast(() => this.getList$);
this.refresh$.connect();
this.refresh$.subscribe(e => console.log(e));
This way has a few advantages, but may be a little less "readable". The pieces are mostly the same, but instead you go to the counter first and let that lead into the switch to the list fetch. and you multicast it to restart the counter.
I'm not clear on how you are tracking getting the next set of items so I will assume it is some form of paging for my answer. I also assume that you don't know the total number of items.
console.clear();
const pageSize = 5;
const pageBuffer = 2;
const data = [...Array(17).keys()]
function getData(page) {
const begin = pageSize * page
const end = begin + pageSize;
return Rx.Observable.of(data.slice(begin, end));
}
const clicks = Rx.Observable.interval(400);
clicks
.scan(count => ++count, 0)
.do(() => console.log('click'))
.map(count => {
const page = Math.floor(count / pageSize) + 1;
const total = page * pageSize;
return { total, page, count }
})
.filter(x => x.total - pageBuffer === x.count)
.startWith({ page: 0 })
.switchMap(x => getData(x.page))
.takeWhile(x => x.length > 0)
.subscribe(
x => { console.log('next: ', x); },
x => { console.log('error: ', x); },
() => { console.log('completed'); }
);
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/5.5.3/Rx.min.js"></script>
Here is an explaination:
Rx.Observable.interval(#): simulates the client click events
.scan(...): accumulates the click events
.map(...): calculates the page index and potential total item count (actual count could be less but it doesn't matter for our purposes
.filter(...): only allow to pass through to get a new page of data if it has just hit the page buffer.
.startWith(...): get the first page without waiting for clicks. The +1 on the page calculation in the .scan accounts for this.
.switchMap(...): get the next page of data.
.takeWhile(...): keep the stream open till we get an empty list.
So it will get an initial page and then go get a new page whenever the number of clicks comes within the designated buffer. Once all items have been retrieved (known by empty list) it will complete.
One thing I didn't figure out how to do is to complete the list when the page length is less than the page size. Not sure if it matters to you.

Confusion over behavior of Publish().Refcount()

I've got a simple program here that displays the number of letters in various words. It works as expected.
static void Main(string[] args) {
var word = new Subject<string>();
var wordPub = word.Publish().RefCount();
var length = word.Select(i => i.Length);
var report =
wordPub
.GroupJoin(length,
s => wordPub,
s => Observable.Empty<int>(),
(w, a) => new { Word = w, Lengths = a })
.SelectMany(i => i.Lengths.Select(j => new { Word = i.Word, Length = j }));
report.Subscribe(i => Console.WriteLine($"{i.Word} {i.Length}"));
word.OnNext("Apple");
word.OnNext("Banana");
word.OnNext("Cat");
word.OnNext("Donkey");
word.OnNext("Elephant");
word.OnNext("Zebra");
Console.ReadLine();
}
And the output is:
Apple 5
Banana 6
Cat 3
Donkey 6
Elephant 8
Zebra 5
I used the Publish().RefCount() because "wordpub" is included in "report" twice. Without it, when a word is emitted first one part of the report would get notified by a callback, and then the other part of report would be notified, double the notifications. That is kindof what happens; the output ends up having 11 items rather than 6. At least that is what I think is going on. I think of using Publish().RefCount() in this situation as simultaneously updating both parts of the report.
However if I change the length function to ALSO use the published source like this:
var length = wordPub.Select(i => i.Length);
Then the output is this:
Apple 5
Apple 6
Banana 6
Cat 3
Banana 3
Cat 6
Donkey 6
Elephant 8
Donkey 8
Elephant 5
Zebra 5
Why can't the length function also use the same published source?
This was a great challenge to solve!
So subtle the conditions that this happens.
Apologies in advance for the long explanation, but bear with me!
TL;DR
Subscriptions to the published source are processed in order, but before any other subscription directly to the unpublished source. i.e. you can jump the queue!
With GroupJoin subscription order is important to determine when windows open and close.
My first concern would be that you are publish refcounting a subject.
This should be a no-op.
Subject<T> has no subscription cost.
So when you remove the Publish().RefCount() :
var word = new Subject<string>();
var wordPub = word;//.Publish().RefCount();
var length = word.Select(i => i.Length);
then you get the same issue.
So then I look to the GroupJoin (because my intuition suggests that Publish().Refcount() is a red herring).
For me, eyeballing this alone was too hard to rationalise, so I lean on a simple debugging too I have used dozens of times of the years - a Trace or Log extension method.
public interface ILogger
{
void Log(string input);
}
public class DumpLogger : ILogger
{
public void Log(string input)
{
//LinqPad `Dump()` extension method.
// Could use Console.Write instead.
input.Dump();
}
}
public static class ObservableLoggingExtensions
{
private static int _index = 0;
public static IObservable<T> Log<T>(this IObservable<T> source, ILogger logger, string name)
{
return Observable.Create<T>(o =>
{
var index = Interlocked.Increment(ref _index);
var label = $"{index:0000}{name}";
logger.Log($"{label}.Subscribe()");
var disposed = Disposable.Create(() => logger.Log($"{label}.Dispose()"));
var subscription = source
.Do(
x => logger.Log($"{label}.OnNext({x.ToString()})"),
ex => logger.Log($"{label}.OnError({ex})"),
() => logger.Log($"{label}.OnCompleted()")
)
.Subscribe(o);
return new CompositeDisposable(subscription, disposed);
});
}
}
When I add the logging to your provided code it looks like this:
var logger = new DumpLogger();
var word = new Subject<string>();
var wordPub = word.Publish().RefCount();
var length = word.Select(i => i.Length);
var report =
wordPub.Log(logger, "lhs")
.GroupJoin(word.Select(i => i.Length).Log(logger, "rhs"),
s => wordPub.Log(logger, "lhsDuration"),
s => Observable.Empty<int>().Log(logger, "rhsDuration"),
(w, a) => new { Word = w, Lengths = a })
.SelectMany(i => i.Lengths.Select(j => new { Word = i.Word, Length = j }));
report.Subscribe(i => ($"{i.Word} {i.Length}").Dump("OnNext"));
word.OnNext("Apple");
word.OnNext("Banana");
word.OnNext("Cat");
word.OnNext("Donkey");
word.OnNext("Elephant");
word.OnNext("Zebra");
This will then output in my log something like the following
Log with Publish().RefCount() used
0001lhs.Subscribe()
0002rhs.Subscribe()
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe()
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()
OnNext
Apple 5
0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe()
0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()
OnNext
Banana 6
...
However when I remove the usage Publish().RefCount() the new log output is as follows:
Log without only Subject
0001lhs.Subscribe()
0002rhs.Subscribe()
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe()
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()
OnNext
Apple 5
0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe()
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()
OnNext
Apple 6
OnNext
Banana 6
0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()
...
This gives us some insight, however when the issue really becomes clear is when we start annotating our logs with a logical list of subscriptions.
In the original (working) code with the RefCount our annotations might look like this
//word.Subsribers.Add(wordPub)
0001lhs.Subscribe() //wordPub.Subsribers.Add(0001lhs)
0002rhs.Subscribe() //word.Subsribers.Add(0002rhs)
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe() //wordPub.Subsribers.Add(0003lhsDuration)
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()
OnNext
Apple 5
0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe() //wordPub.Subsribers.Add(0005lhsDuration)
0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose() //wordPub.Subsribers.Remove(0003lhsDuration)
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()
OnNext
Banana 6
So in this example, when word.OnNext("Banana"); is executed the chain of observers is linked in this order
wordPub
0002rhs
However, wordPub has child subscriptions!
So the real subscription list looks like
wordPub
0001lhs
0003lhsDuration
0005lhsDuration
0002rhs
If we annotate the Subject only log we see where the subtlety lies
0001lhs.Subscribe() //word.Subsribers.Add(0001lhs)
0002rhs.Subscribe() //word.Subsribers.Add(0002rhs)
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe() //word.Subsribers.Add(0003lhsDuration)
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()
OnNext
Apple 5
0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe() //word.Subsribers.Add(0005lhsDuration)
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()
OnNext
Apple 6
OnNext
Banana 6
0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()
So in this example, when word.OnNext("Banana"); is executed the chain of observers is linked in this order
1. 0001lhs
2. 0002rhs
3. 0003lhsDuration
4. 0005lhsDuration
As the 0003lhsDuration subscription is activated after the 0002rhs, it wont see the "Banana" value to terminate the window, until after the rhs has been sent the value, thus yielding it in the still open window.
Whew
As #francezu13k50 points out the obvious and simple solution to your problem is to just use word.Select(x => new { Word = x, Length = x.Length });, but as I think you have given us a simplified version of your real problem (appreciated) I understand why this isn't suitable.
However, as I dont know what your real problem space is I am not sure what to suggest to you to provide a solution, except that you have one with your current code, and now you should know why it works the way it does.
RefCount returns an Observable that stays connected to the source as long as there is at least one subscription to the returned Observable. When the last subscription is disposed, RefCount disposes it's connection to the source, and reconnects when a new subscription is being made. It might be the case with your report query that all subscriptions to the 'wordPub' are disposed before the query is fulfilled.
Instead of the complicated GroupJoin query you could simply do :
var report = word.Select(x => new { Word = x, Length = x.Length });
Edit:
Change your report query to this if you want to use the GroupJoin operator :
var report =
wordPub
.GroupJoin(length,
s => wordPub,
s => Observable.Empty<int>(),
(w, a) => new { Word = w, Lengths = a })
.SelectMany(i => i.Lengths.FirstAsync().Select(j => new { Word = i.Word, Length = j }));
Because GroupJoin seems to be very tricky to work with, here is another approach for correlating the inputs and outputs of functions.
static void Main(string[] args) {
var word = new Subject<string>();
var length = new Subject<int>();
var report =
word
.CombineLatest(length, (w, l) => new { Word = w, Length = l })
.Scan((a, b) => new { Word = b.Word, Length = a.Word == b.Word ? b.Length : -1 })
.Where(i => i.Length != -1);
report.Subscribe(i => Console.WriteLine($"{i.Word} {i.Length}"));
word.OnNext("Apple"); length.OnNext(5);
word.OnNext("Banana");
word.OnNext("Cat"); length.OnNext(3);
word.OnNext("Donkey");
word.OnNext("Elephant"); length.OnNext(8);
word.OnNext("Zebra"); length.OnNext(5);
Console.ReadLine();
}
This approach works if every input has 0 or more outputs subject to the constraints that (1) outputs only arrive in the same order as the inputs AND (2) each output corresponds to its most recent input. This is like a LeftJoin - each item in the first list (word) is paired with items in the right list (length) that subsequently arrive, up until another item in the first list is emitted.
Trying to use regular Join instead of GroupJoin. I thought the problem was that when a new word was created there was a race condition inside Join between creating a new window and ending the current one. So here I tried to elimate that by pairing every word with a null signifying the end of the window. Doesn't work, just like the first version did not. How is it possible that a new window is created for each word without the previous one being closed first? Completely confused.
static void Main(string[] args) {
var lgr = new DelegateLogger(Console.WriteLine);
var word = new Subject<string>();
var wordDelimited =
word
.Select(i => Observable.Return<string>(null).StartWith(i))
.SelectMany(i => i);
var wordStart = wordDelimited.Where(i => i != null);
var wordEnd = wordDelimited.Where(i => i == null);
var report = Observable
.Join(
wordStart.Log(lgr, "word"), // starts window
wordStart.Select(i => i.Length),
s => wordEnd.Log(lgr, "expireWord"), // ends current window
s => Observable.Empty<int>(),
(l, r) => new { Word = l, Length = r });
report.Subscribe(i => Console.WriteLine($"{i.Word} {i.Length}"));
word.OnNext("Apple");
word.OnNext("Banana");
word.OnNext("Cat");
word.OnNext("Zebra");
word.OnNext("Elephant");
word.OnNext("Bear");
Console.ReadLine();
}

RX misunderstood behavior

I have the below repro code which demonstrate a problem in a more complex flow:
static void Main(string[] args)
{
var r = Observable.Range(1, 10).Finally(() => Console.WriteLine("Disposed"));
var x = Observable.Create<int>(o =>
{
for (int i = 1; i < 11; i++)
{
o.OnNext(i);
}
o.OnCompleted();
return Disposable.Create(() => Console.WriteLine("Disposed"));
});
var src = x.Publish().RefCount();
var a = src.Where(i => i % 2 == 0).Do(i => Console.WriteLine("Pair:" + i));
var b = src.Where(i => i % 2 != 0).Do(i => Console.WriteLine("Even:" + i));
var c = Observable.Merge(a, b);
using (c.Subscribe(i => Console.WriteLine("final " + i), () => Console.WriteLine("Complete")))
{
Console.ReadKey();
}
}
running this snippet with r as src (var src = r.Publish().RefCount()) will produce all the numbers from 1 till 10,
switching the src to x(like in example) will produce only the pairs, actually the first observable to subscribe unless i change Publish() to Replay().
Why? What is the difference between r and x?
Thanks.
Although I do not have the patience to sort through the Rx.NET source code to find exactly what implementation detail causes this exact behavior, I can provide the following insight:
The difference in behavior your are seeing is caused by a race condition. The racers in this case are the subscriptions of a and b which happen as a result of your subscription to the observable returned by Observable.Merge. You subscribe to c, which in turn subscribes to a and b. a and b are defined in terms of a Publish and RefCount of either x or r, depending on which case you choose.
Here's what's happening.
src = r
In this case, you are using a custom Observable. When subscribed to, your custom observible immediately and synchronously begins to onNext the numbers 1 though 10, and then calls onCompleted. Interestingly enough, this subscription is caused by your Publish().RefCount() Observable when it is subscribe to the first time. It is subscribed to the first time by a, because a is the first parameter to Merge. So, before Merge has even subscribed to b, your subscription has already completed. Merge subscribes to b, which is the RefCount observable. That observable is already completed, so Merge looks for the next Observable to merge. Since there are no more Observables to merge, and because all of the existing Observables have completed, the merged observable completes.
The values onNext'd through your custom observable have traveled through the "pairs" observable, but not the "evens" observable. Therefore, you end up with the following:
// "pairs" (has this been named incorrectly?)
[2, 4, 6, 8, 10]
src = x
In this case, you are using the built-in Range method to create an Observable. When subscribed to, this Range Observable does something that eventually ends up yielding the numbers 1 though 10. Interesting. We haven't a clue what's happening in that method, or when it's happening. We can, however, make some observations about it. If we look at what happens when src = r (above), we can see that only the first subscription takes effect because the observable is yielding immediately and synchronously. Therefore, we can determine that the Range Observable must not be yielding in the same manner, but instead allows the application's control flow to execute the subscription to b before any values are yielded. The difference between your custom Observable and this Range Observable, is probably that the Range Observable is scheduling the yields to happen on the CurrentThread Scheduler.
How to avoid this kind of race condition:
var src = a.Publish(); // not ref count
var a = src.where(...);
var b = src.where(...);
var c = Observable.Merge(a, b);
var subscription = c.Subscribe(i => Console.WriteLine("final " + i), () => Console.WriteLine("Complete"))
// don't dispose of the subscription. The observable creates an auto-disposing subscription which will call dispose once `OnCompleted` or `OnError` is called.
src.Connect(); // connect to the underlying observable, *after* merge has subscribed to both a and b.
Notice that the solution to fixing the subscription to this composition of Observables was not to change how the source observable works, but instead to make sure your subscription logic isn't allowing any race conditions to exist. This is important, because trying to fix this problem in the Observable is simply changing behavior, not fixing the race. Had we changed the source and switched it out later, the subscription logic would still be buggy.
I suspect it's the schedulers. This change causes the two to behave identically:
var x = Observable.Create<int>(o =>
{
NewThreadScheduler.Default.Schedule(() =>
{
for (int i = 1; i < 11; i++)
{
o.OnNext(i);
}
o.OnCompleted();
});
return Disposable.Create(() => Console.WriteLine("Disposed"));
});
Whereas using Scheduler.Immediate gives the same behavior as yours.

Need help optimizing a google apps script that labels emails

Gmail has a issue where conversation labels are not applied to new messages that arrive in the conversation thread. issue details here
We found a Google Apps Script that fixes the labels on individual messages in the Gmail Inbox to address this issue. The script is as follows:
function relabeller() {
var labels = GmailApp.getUserLabels();
for (var i = 0; i < labels.length; i++) {
Logger.log("label: " + i + " " + labels[i].getName());
var threads = labels[i].getThreads(0,100);
for (var j = 1; threads.length > 0; j++) {
Logger.log( (j - 1) * 100 + threads.length);
labels[i].addToThreads(threads);
threads = labels[i].getThreads(j*100, 100);
}
}
}
However this script times out on email boxes with more than 20,000 messages due to the 5 mins execution time limit on Google Apps Script.
Can anyone please suggest a way to optimize this script so that it doesn't timeout?
OK, I've been working on this for a few days because I was really frustrated with the strange way that Gmail labels/doesn't label messages in conversations.
I'm flabbergasted actually that labels aren't automatically applied to new messages in a conversation. This is not reflected at all in the Gmail UI. There's no way to look at a thread and determine that the labels only apply to some messages in the thread, and you cannot add labels to a single message in the UI. As I was working through my script below, I noticed that you can't even programmatically add labels to a single message. So there really is no reason for the current behavior.
With my rant out of the way, I have a few notes about the script.
I sort of combined Saqib's code with Serge's code.
The script has two parts: an initial run that relabels all threads that have a user label attached, and a maintenance run that labels recent emails (currently looks back 4 days). Only one part executes during a single run. Once the initial run is completed, only the maintenance part will run. You can set a trigger to it run once per day, or more or less often, depending on your needs.
The initial run halts after 4 minutes to avoid being terminated by the 5 minute script time limit. It sets a trigger to run again after 4 minutes (both of these times can be changed using constants in the script). The trigger gets deleted at the next run.
There is no run-time check in the maintenance section. If you have lots of emails in the last 4 days, the maintenance section might hit the script time limit. I could probably change the script to be more efficient here, but so far it's worked for me so I am not really motivated to improve on it.
There's a try/catch statement in the initial run to try to catch the Gmail "write quota error" and exit gracefully (i.e. writing the current progress so it can be picked up again later), but I don't know if it works because I couldn't get the error to happen.
You'll get an email when the time limit is reached, and when the initial run is finished.
For some reason, the log doesn't always clear fully between runs, even when using the Logger.clear() command. So the status logs that it emails to the user have more than just the most recent run info. I don't know why this occurs.
I have used this to process 20,000 emails in around half an hour (including wait times). I actually ran it twice, so it processed 40,000 emails in one day. I guess the Gmail read/write limit of 10,000 isn't what is being applied here (maybe applying a label to 100 threads at a time counts as a single write event instead of 100?). It gets through about 5,000 threads in a 4 minute run, according to the status email it sends.
Sorry for the long lines. I blame the widescreen monitors. Let me know what you think!
function relabelGmail() {
var startTime= (new Date()).getTime(); // Time at start of script
var BATCH=100; // total number of threads to apply label to at once.
var LOOKBACKDAYS=4; // Days to look back for maintenance section of script. Should be at least 2
var MAX_RUN_TIME=4*60*1000; // Time in ms for max execution. 4 minutes is a good start.
var WAIT_TIME=4*60*1000; // Time in ms to wait before starting the script again.
Logger.clear();
// ScriptProperties.deleteAllProperties(); return; // Uncomment this line and run once to start over completely
if(ScriptProperties.getKeys().length==0){ // this is to create keys on the first run
ScriptProperties.setProperties({'itemsProcessed':0, 'initFinished':false, 'lastrun':'20000101', 'itemsProcessedToday':0,
'currentLabel':'null-label-NOTREAL', 'currentLabelStart':0, 'autoTrig':0, 'autoTrigID':'0'});
}
var itemsP = Number(ScriptProperties.getProperty('itemsProcessed')); // total counter
var initTemp = ScriptProperties.getProperty('initFinished'); // keeps track of when initial run is finished.
var initF = (initTemp.toLowerCase() == 'true'); // Make it boolean
var lastR = ScriptProperties.getProperty('lastrun'); // String of date corresponding to itemsProcessedToday in format yyyymmdd
var itemsPT = Number(ScriptProperties.getProperty('itemsProcessedToday')); // daily counter
var currentL = ScriptProperties.getProperty('currentLabel'); // Label currently being processed
var currentLS = Number(ScriptProperties.getProperty('currentLabelStart')); // Thread number to start on
var autoT = Number(ScriptProperties.getProperty('autoTrig')); // Number to say whether the last run made an automatic trigger
var autoTID = ScriptProperties.getProperty('autoTrigID'); // Unique ID of last written auto trigger
// First thing: google terminates scripts after 5 minutes.
// If 4 minutes have passed, this script will terminate, write some data,
// and create a trigger to re-schedule itself to start again in a few minutes.
// If an auto trigger was created last run, it is deleted here.
if (autoT) {
var allTriggers = ScriptApp.getProjectTriggers();
// Loop over all triggers. If trigger isn't found, then it must have ben deleted.
for(var i=0; i < allTriggers.length; i++) {
if (allTriggers[i].getUniqueId() == autoTID) {
// Found the trigger and now delete it
ScriptApp.deleteTrigger(allTriggers[i]);
break;
}
}
autoT = 0;
autoTID = '0';
}
var today = dateToStr_();
if (today == lastR) { // If new day, reset daily counter
// Don't do anything
} else {
itemsPT = 0;
}
if (!initF) { // Don't do any of this if the initial run has been completed
var labels = GmailApp.getUserLabels();
// Find position of last label attempted
var curLnum=0;
for ( ; curLnum < labels.length; curLnum++) {
if (labels[curLnum].getName() == currentL) {break};
}
if (curLnum == labels.length) { // If label isn't found, start over at the beginning
curLnum = 0;
currentLS = 0;
itemsP=0;
currentL=labels[0].getName();
}
// Now start working through the labels until the quota is hit.
// Use a try/catch to stop execution if your quota has been hit.
// Google can actually automatically email you, but we need to clean up a bit before terminating the script so it can properly pick up again tomorrow.
try {
for (var i = curLnum; i < labels.length; i++) {
currentL = labels[i].getName(); // Next label
Logger.log('label: ' + i + ' ' + currentL);
var threads = labels[i].getThreads(currentLS,BATCH);
for (var j = Math.floor(currentLS/BATCH); threads.length > 0; j++) {
var currTime = (new Date()).getTime();
if (currTime-startTime > MAX_RUN_TIME) {
// Make the auto-trigger
autoT = 1; // So the auto trigger gets deleted next time.
var autoTrigger = ScriptApp.newTrigger('relabelGmail')
.timeBased()
.at(new Date(currTime+WAIT_TIME))
.create();
autoTID = autoTrigger.getUniqueId();
// Now write all the values.
ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT,
'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigID':autoTID});
// Send an email
var emailAddress = Session.getActiveUser().getEmail();
GmailApp.sendEmail(emailAddress, 'Relabel job in progress', 'Your Gmail Relabeller has halted to avoid termination due to excess ' +
'run time. It will run again in ' + WAIT_TIME/1000/60 + ' minutes.\n\n' + itemsP + ' threads have been processed. ' + itemsPT +
' have been processed today.\n\nSee the log below for more information:\n\n' + Logger.getLog());
return;
} else {
// keep on going
var len = threads.length;
Logger.log( j * BATCH + len);
labels[i].addToThreads(threads);
currentLS = currentLS + len;
itemsP = itemsP + len;
itemsPT = itemsPT + len;
threads = labels[i].getThreads( (j+1) * BATCH, BATCH);
}
}
currentLS = 0; // Reset LS counter
}
initF = true; // Initial run is done
} catch (e) { // Clean up and send off a notice.
// Write current values back to ScriptProperties
ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT,
'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigID':autoTID});
var emailAddress = Session.getActiveUser().getEmail();
var errorDate = new Date();
GmailApp.sendEmail(emailAddress, 'Error "' + e.name + '" in Google Apps Script', 'Your Gmail Relabeller has failed in the following stack:\n\n' +
e.stack + '\nThis may be due to reaching your daily Gmail read/write quota. \nThe error message is: ' +
e.message + '\nThe error occurred at the following date and time: ' + errorDate + '\n\nThus far, ' +
itemsP + ' threads have been processed. ' + itemsPT + ' have been processed today. \nSee the log below for more information:' +
'\n\n' + Logger.getLog());
return;
}
// Write current values back to ScriptProperties. Send completion email.
ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT,
'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigNumber':autoTID});
var emailAddress = Session.getActiveUser().getEmail();
GmailApp.sendEmail(emailAddress, 'Relabel job completed', 'Your Gmail Relabeller has finished its initial run.\n' +
'If you continue to run the script, it will skip the initial run and instead relabel ' +
'all emails from the previous ' + LOOKBACKDAYS + ' days.\n\n' + itemsP + ' threads were processed. ' + itemsPT +
' were processed today. \nSee the log below for more information:' + '\n\n' + Logger.getLog());
return; // Don't run the maintenance section after initial run finish
} // End initial run section statement
// Below is the 'maintenance' section that will be run when the initial run is finished. It finds all new threads
// (as defined by LOOKBACKDAYS) and applies any existing labels to all messages in each thread. Note that this
// won't miss older threads that are labeled by the user because all messages in a thread get the label
// when the label action is first performed. If another message is then sent or received in that thread,
// then this maintenance section will find it because it will be deemed a "new" thread at that point.
// You may need to search further back the first time you run this if it took more than 3 days to finish
// the initial run. For general maintenance, though, 4 days should be plenty.
// Note that I have not implemented a script-run-time check for this section.
var threads = GmailApp.search('newer_than:' + LOOKBACKDAYS + 'd', 0, BATCH); //
var len = threads.length;
for (var i=0; len > 0; i++) {
for (var t = 0; t < len; t++) {
var labels = threads[t].getLabels();
for (var l = 0; l < labels.length; l++) { // Add each label to the thread
labels[l].addToThread(threads[t]);
}
}
itemsP = itemsP + len;
itemsPT = itemsPT + len;
threads = GmailApp.search('newer_than:' + LOOKBACKDAYS + 'd', (i+1) * BATCH, BATCH);
len = threads.length;
}
// Write the property data
ScriptProperties.setProperties({'itemsProcessed':itemsP, 'initFinished':initF, 'lastrun':today, 'itemsProcessedToday':itemsPT,
'currentLabel':currentL, 'currentLabelStart':currentLS, 'autoTrig':autoT, 'autoTrigID':autoTID});
}
// Takes a date object and turns it into a string of form yyyymmdd
function dateToStr_(dateObj) { //takes in a date object, but uses current date if not a date
if (!(dateObj instanceof Date)) {
dateObj = new Date();
}
var dd = dateObj.getDate();
var mm = dateObj.getMonth()+1; //January is 0!
var yyyy = dateObj.getFullYear();
if(dd<10){dd='0'+dd};
if(mm<10){mm='0'+mm};
dateStr = ''+yyyy+mm+dd;
return dateStr;
}
Edit: 3/24/2017
I guess I should turn on notifications or something, because I never saw the question from user29020. In case anyone ever has the same question, here's what I do: I run it as a maintenance function by setting a daily trigger to run each night between 1 and 2 AM.
An additional note: It seems that at some point in the last year or so, labeling calls to Gmail have slowed down significantly. It now takes around 0.2 seconds per thread, so I would expect an initial run of 20k emails to take at least 20 runs or so before it makes it all the way through. This also means that if you typically receive more than 100-200 emails a day, the maintenance section might also start to take too long and start to fail. Now that's a lot of emails, but I bet there are some people that receive that many, and it seems much more likely that you would hit that than the 1000 or so daily emails that would have been needed for failure back when I first wrote the script.
Anyway, one mitigation would be to reduce the LOOKBACKDAYS to less than 4, but I wouldn't recommend putting it less than 2.
From the documentation :
method getInboxThreads()
Retrieve all Inbox threads irrespective of labels
This call will fail when the size of all threads is too large for the system to handle. Where the thread size is unknown, and potentially very large, please use the 'paged' call, and specify ranges of the threads to retrieve in each call.*
So you should handle a certain number of threads, label the messages and set up a time trigger to run each "page" every 10 minutes or so until all the messages are labelled.
EDIT : I have given this a try , please consider as a draft to start with :
The script will process 100 threads at a time and send you an email to inform you on its progress and show the log.
When it's finished it will warn you with an email as well. It uses scriptProperties to store its state. (don't forget to update the mail adress at the end of the script). I tried it with a time trigger set to 5 minutes and it seems to run smoothly for now...
function inboxLabeller() {
if(ScriptProperties.getKeys().length==0){ // this is to create keys on the first run
ScriptProperties.setProperties({'threadStart':0, 'itemsprocessed':0, 'notF':true})
}
var items = Number(ScriptProperties.getProperty('itemsprocessed'));// total counter
var tStart = Number(ScriptProperties.getProperty('threadStart'));// the value to start with
var notFinished = ScriptProperties.getProperty('notF');// the "main switch" ;-)
Logger.clear()
while (notFinished){ // the main loop
var threads = GmailApp.getInboxThreads(tStart,100);
Logger.log('Number of threads='+Number(tStart+threads.length));
if(threads.length==0){
notFinished=false ;
break
}
for(t=0;t<threads.length;++t){
var mCount = threads[t].getMessageCount();
var mSubject = threads[t].getFirstMessageSubject();
var labels = threads[t].getLabels();
var labelsNames = '';
for(var l in labels){labelsNames+=labels[l].getName()}
Logger.log('subject '+mSubject+' has '+mCount+' msgs with labels '+labelsNames)
for(var l in labels){
labels[l].addToThread(threads[t])
}
}
tStart = tStart+100;
items = items+100
ScriptProperties.setProperties({'threadStart':tStart, 'itemsprocessed':items})
break
}
if(notFinished){
GmailApp.sendEmail('mymail', 'inboxLabeller progress report', 'Still working, '+items+' processed \n - see logger below \n \n'+Logger.getLog());
}else{
GmailApp.sendEmail('mymail', 'inboxLabeller End report', 'Job completed : '+items+' processed');
ScriptProperties.setProperties({'threadStart':0, 'itemsprocessed':0, 'notF':true})
}
}
This will find individual messages that do not have a label and apply the label of the associated thread. It takes much less time because it's not relabeling every single message.
function label_unlabeled_messages() {
var unlabeled = GmailApp.search("has:nouserlabels -label:inbox -label:sent -label:chats -label:draft -label:spam -label:trash");
for (var i = 0; i < unlabeled.length; i++) {
Logger.log("thread: " + i + " " + unlabeled[i].getFirstMessageSubject());
labels = unlabeled[i].getLabels();
for (var j = 0; j < labels.length; j++) {
Logger.log("labels: " + i + " " + labels[j].getName());
labels[j].addToThread(unlabeled[i]);
}
}
}