Expected behavior when multiple things happen together in select

Expected behavior when multiple things happen together in select - select

Assuming one goroutine is waiting on the following select on two unbuffered channels one and two
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
}
and one one goroutine is waiting on the following send
one <- 1
and another is waiting on the following
two <- 2
The first waiting on a select implies that there is room in the buffer for both the channels one and two, then which select case is guaranteed to run? Is it deterministic or can either run with one channel left with one unread value at the end.
If there is only one guaranteed net output, then do selects ensure a total order across all operations on all the channels participating in the select? That seems very inefficient..
For example in the following code
package main
import (
"fmt"
"time"
"sync"
)
func main() {
one_net := 0
two_net := 0
var mtx = &sync.Mutex{}
for i := 0; i < 8; i++ {
one, two := make(chan int), make(chan int)
go func() { // go routine one
select {
case <-one:
fmt.Println("read from one")
mtx.Lock()
one_net++
mtx.Unlock()
case <-two:
fmt.Println("read from two")
mtx.Lock()
two_net++
mtx.Unlock()
}
}()
go func() { // go routine two
one <- 1
mtx.Lock()
one_net--
mtx.Unlock()
fmt.Println("Wrote to one")
}()
go func() { // go routine three
two <- 2
mtx.Lock()
two_net--
mtx.Unlock()
fmt.Println("Wrote to two")
}()
time.Sleep(time.Millisecond)
}
mtx.Lock()
fmt.Println("one_net", one_net)
fmt.Println("two_net", two_net)
mtx.Unlock()
}
can there even be a mismatch in the number of reads vs the number of writes (i.e. can one_net and two_net be non 0 at the end)? For example in the case where the select statement is waiting on a read from both channels, and then goroutines two and three go through with their respective writes, but then the select only picks up on one of those writes.

The Go Programming Language Specification
Select statements
A "select" statement chooses which of a set of possible send or
receive operations will proceed.
If one or more of the communications can proceed, a single one that
can proceed is chosen via a uniform pseudo-random selection.
Your question is imprecise: How to create a Minimal, Complete, and Verifiable example. For example,
chan.go:
package main
import (
"fmt"
"time"
)
func main() {
fmt.Println()
for i := 0; i < 8; i++ {
one, two := make(chan int), make(chan int)
go func() { // goroutine one
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
}
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
}
fmt.Println()
}()
go func() { // goroutine two
one <- 1
}()
go func() { // goroutine three
two <- 2
}()
time.Sleep(time.Millisecond)
}
}
Output:
$ go run chan.go
read from two
read from one
read from one
read from two
read from one
read from two
read from two
read from one
read from one
read from two
read from two
read from one
read from one
read from two
read from two
read from one
$
What behavior do you expect and why?
The Go Programming Language Specification
Channel types
A channel provides a mechanism for concurrently executing functions to
communicate by sending and receiving values of a specified element
type.
A new, initialized channel value can be made using the built-in
function make, which takes the channel type and an optional capacity
as arguments:
make(chan int, 100)
The capacity, in number of elements, sets the size of the buffer in
the channel. If the capacity is zero or absent, the channel is
unbuffered and communication succeeds only when both a sender and
receiver are ready. Otherwise, the channel is buffered and
communication succeeds without blocking if the buffer is not full
(sends) or not empty (receives). A nil channel is never ready for
communication.
Go statements
A "go" statement starts the execution of a function call as an
independent concurrent thread of control, or goroutine, within the
same address space.
The function value and parameters are evaluated as usual in the
calling goroutine, but unlike with a regular call, program execution
does not wait for the invoked function to complete. Instead, the
function begins executing independently in a new goroutine. When the
function terminates, its goroutine also terminates. If the function
has any return values, they are discarded when the function completes.
Analyzing your new example:
The channels are unbuffered. Goroutines two and three wait on goroutine one. A send on an unbuffered channel waits until there is a pending receive. When the goroutine one select is evaluated, there will be a pending receive on either channel one or channel two. The goroutine, two or three, that sends on that channel can now send and terminate. Goroutine one can now execute a receive on that channel and terminate. As a crude goroutine synchronization mechanism, we wait goroutine main for one millisecond and then terminate it, which terminates any other goroutines. It will terminate the goroutine, two or three, that didn't get to send because it's still waiting for a pending receive.
You ask "can there even be a mismatch in the number of reads vs the number of writes (i.e. can one_net and two_net be non 0 at the end)? For example in the case where the select statement is waiting on a read from both channels, and then goroutines two and three go through with their respective writes, but then the select only picks up on one of those writes."
Only one of goroutines two and three gets to send (write). There will be exactly one (send) write and one (receive) read. This assumes that goroutine main does not terminate before this occurs, that is, it occurs within one millisecond.

As peterSO points out, selection among multiple simultaneous channels that are ready is pseudo-random.
However, it is important to notice that in most cases, you will have race conditions between the sending and/or receiving goroutines, which also introduces indeterminism.
In fact, peterSO's example illustrates this very situation; at the point where the receiving goroutine reaches the first select statement, there is no guarantee whether any or both of the sending goroutines have executed their respective send statement. The relevant snippet follows, with some added comments:
a, b := make(chan int), make(chan int)
go func() { // goroutine one
// At this point, any or none of the channels could be ready.
select {
case <-a:
fmt.Println("read from a")
case <-b:
fmt.Println("read from b")
}
// At this point, we will have read one, and will block waiting for the other.
select {
case <-a:
fmt.Println("read from a")
case <-b:
fmt.Println("read from b")
}
fmt.Println()
}()
go func() { // goroutine two
a <- 1 // Does this execute first?
}()
go func() { // goroutine three
b <- 2 // ...or does this?
}()
In general, when writing concurrent programmes, one should avoid relying on concurrent events happening in any particular determined order. Unless your program logic serialises things, as a rule of thumb, consider them happening in an indeterminate (though not necessarily random and evenly distributed) order, and you will be safe more often than sorry.

Related

Why is a monitor implemented in terms of semaphores this way?

I have trouble understanding the implementation of a monitor in terms of semaphores from Operating System Concepts
5.8.3 Implementing a Monitor Using Semaphores
We now consider a possible implementation of the monitor mechanism
using semaphores.
For each monitor, a semaphore mutex (initialized to 1) is provided. A
process must execute wait(mutex) before entering the monitor and must
execute signal(mutex) after leaving the monitor.
Since a signaling process must wait until the resumed process either leaves or waits, an additional semaphore, next, is introduced,
initialized to 0. The signaling processes can use next to suspend
themselves. An integer variable next_count is also provided to count
the number of processes suspended on next. Thus, each external
function F is replaced by
wait(mutex);
...
body of F
...
if (next count > 0)
signal(next);
else
signal(mutex);
Mutual exclusion within a monitor is ensured.
We can now describe how condition variables are implemented as well.
For each condition x, we introduce a semaphore x_sem and an
integer variable x_count, both initialized to 0. The operation x.wait() can now be implemented as
x_count++;
if (next_count > 0)
signal(next);
else
signal(mutex);
wait(x sem);
x_count--;
The operation x.signal() can be implemented as
if (x_count > 0) {
next_count++;
signal(x_sem);
wait(next);
next_count--;
}
What does the reason for introducing semaphore next and the count next_count of processes suspended on next mean?
Why are x.wait() and x.signal() implemented the way they are?
Thanks.

------- Note -------
WAIT() and SIGNAL() denote calls on monitor methods
wait() and signal() denote calls to semaphore methods, in the explanation that follows.
------- End of Note -------
I think it is easier if you think in terms of a concrete example. But before that let's first try to understand what a monitor is. As explained in the book a monitor is a Abstract Data Type meaning that it is not a real type which can be used to instantiate a variable. Rather it is like a specification with some rules and guidelines based on which different languages could provide support for process synchronization.
Semaphors were introduced as a software-based solution for achieving synchronization over hardware-based approaches like TestAndSet() or Swap(). Even with semaphores, the programmers had to ensure that they invoke the wait() & signal() methods in the right order and correctly. So, an abstract specification called monitors were introduced to encapsulate all these things related to synchronization as one primitive so simply any process executing inside the monitor will ensure that these methods (semaphore wait and signal) invocations are used accordingly.
With monitors all shared variables and functions (that use the shared variables) are put into the monitor structure and when any of these functions are invoked the monitor implementation takes care of ensuring that the shared resources are protected over mutual exclusion and any issues of synchronization.
Now with monitors unlike semaphores or other synchronization techniques we are not dealing with just one portion of the critical section but many of them in terms of different functions. In addition, we do also have shared variables that are accessed within these functions. For each of the different functions in a monitor to ensure only one of them is executed and no other process is executing on any of the functions, we can use a global semaphore called mutex.
Consider the example of the solution for the dining philosophers problem using monitors below.
monitor dining_philopher
{
enum {THINKING, HUNGRY, EATING} state[5];
condition self[5];
void pickup(int i) {
state[i] = HUNGRY;
test(i);
if (state[i] != EATING)
self[i].WAIT();
}
void putdown(int i) {
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
}
void test(int i) {
if (
(state[(i + 4) % 5] != EATING) &&
(state[i] == HUNGRY) &&
(state[(i + 1) % 5] != EATING))
{
state[i] = EATING;
self[i].SIGNAL();
}
}
initialization code() {
for (int i = 0; i < 5; i++)
state[i] = THINKING;
}
}
}
Ideally, how a process might invoke these functions would be in the following sequence:
DiningPhilosophers.pickup(i);
...
// do somework
...
DiningPhilosophers.putdown(i);
Now, whilst one process is executing inside the pickup() method another might try to invoke putdown() (or even the pickup) method. In order to ensure mutual exclusion we must ensure only one process is running inside the monitor at any given time. So, to handle these cases we have a global semaphore mutex that encapsulates all the invokable (pickup & putdown) methods. So these two methods will be implemented as follows:
void pickup(int i) {
// wait(mutex);
state[i] = HUNGRY;
test(i);
if (state[i] != EATING)
self[i].WAIT();
// signal(mutex);
}
void putdown(int i) {
// wait(mutex);
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
// signal(mutex);
}
Now only one process will be able to execute inside the monitor in any of its methods. Now, with this setup, if Process P1 has executed pickup() (but is yet tp putdown the chopsticks) and then Process P2 (say an adjacent diner) tries to pickup(): since his/her chopsticks (shared resource) is in use, it has to wait() for it to be available. Let's look at the WAIT and SIGNAL implementation of the monitor's conditional variables:
WAIT(){
x_count++;
if (next_count > 0)
signal(next);
else
signal(mutex);
wait(x_sem);
x_count--;
}
SIGNAL() {
if (x_count > 0) {
next_count++;
signal(x_sem);
wait(next);
next_count--;
}
}
The WAIT implementation of the conditional variables is different from that of the Semaphore's because it has to provide more functionality, like allowing other processes to invoke functions of the monitor (whilst it waits) by releasing the mutex global semaphore. So, when WAIT is invoked by P2 from the pickup() method, it will call signal(mutex) allowing other processes to invoke the monitor methods and call wait(x_sem) on the semaphore specific to the conditional. Now, P2 is blocked here. In addition, the variable x_count keeps track of the number of Processes waiting on the conditional variable (self).
So when P1 invokes putdown(), this will invoke SIGNAL via the test() method. Inside SIGNAL when P1 invokes signal(x_sem) on the chopstick it holds, it must do one additional thing. It must ensure that only one process is running inside the monitor. If it would only call signal(x_sem) then from that point onwards P1 and P2 both would start doing things inside the monitor. To prevent this P1, after releasing its chopstick it will block itself until P2 finishes. To block itself, it uses the semaphore next. And to notify P2 or some other process that there is someone blocked it uses a counter next_count.
So, now P2 would get the chopsticks and before it exits the pickup() method it must release P1 who is waiting on P2 to finish. So now, we must change the pickup() method (and all functions of the monitor) as follows:
void pickup(int i) {
// wait(mutex);
state[i] = HUNGRY;
test(i);
if (state[i] != EATING)
self[i].WAIT();
/**************
if (next_count > 0)
signal(next);
else
signal(mutex);
**************/
}
void putdown(int i) {
// wait(mutex);
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
/**************
if (next_count > 0)
signal(next);
else
signal(mutex);
**************/
}
So now, before any process exits a function of the monitor, it checks if there are any waiting processes and if so releases them and not the mutex global semaphore. And the last of such waiting processes will release the mutex semaphore allowing new processes to enter into the monitor functions.
I know it's pretty long, but it took some time for me to understand and wanted to put it in writing. I will post it on a blog soon.
If there are any mistakes please let me know.
Best,
Shabir

I agree its confusing.
Lets first understand the first piece of code:
// if you are the only process on the queue just take the monitor and invoke the function F.
wait(mutex);
...
body of F
...
if (next_count > 0)
// if some process already waiting to take the monitor you signal the "next" semaphore and let it take the monitor.
signal(next);
else
// otherwise you signal the "mutex" semaphore so if some process requested the monitor later.
signal(mutex);
back to your questions:
What does the reason for introducing semaphore next and the count
next_count of processes suspended on next mean?
imagine you have a process that is doing some I/O and it needs to be blocked until it finishes. so you let other processes waiting in the ready queue to take the monitor and invoke the function F.
next_count is only for the purpose to keep track of processes waiting in the queue.
a process suspended on next semaphore is the process who issued wait on condition variable so it will be suspended until some other
process (next process) wake it up and resume work.
Why are x.wait() and x.signal() implemented the way they are?
Lets take the x.wait():
semaphore x_sem; // (initially = 0)
int x_count = 0; // number of process waiting on condition (x)
/*
* This is used to indicate that some process is issuing a wait on the
* condition x, so in case some process has sent a signal x.signal()
* without no process is waiting on condition x the signal will be lost signal (has no effect).
*/
x_count++;
/*
* if there is some process waiting on the ready queue,
* signal(next) will increase the semaphore internal counter so other processes can take the monitor.
*/
if (next_count > 0)
signal(next);
/*
* Otherwise, no process is waiting.
* signal(mutex) will release the mutex.
*/
else
signal(mutex);
/*
* now the process that called x.wait() will be blocked until other process will release (signal) the
* x_sem semaphore: signal(x_sem)
*/
wait(x_sem);
// process is back from blocking.
// we are done, decrease x_count.
x_count--;
Now lets take the x.signal():
// if there are processes waiting on condition x.
if (x_count > 0) {
// increase the next count as new blocked process has entered the queue (the one who called x.wait()). remember (wait(x_sem))
next_count++;
// release x_sem so the process waiting on x condition resume.
signal(x_sem);
// wait until next process is done.
wait(next);
// we are done.
next_count--;
}
Comment if you have any questions.

Why does this select always run the default case when the first case actually is executed?

I'm trying to get a better understanding of golang channels. While reading this article I'm toying with non-blocking sends and have come up with the following code:
package main
import (
"fmt"
"time"
)
func main() {
stuff := make(chan int)
go func(){
for i := 0; i < 5; i ++{
select {
case stuff <- i:
fmt.Printf("Sent %v\n", i)
default:
fmt.Printf("Default on %v\n", i)
}
}
println("Closing")
close(stuff)
}()
time.Sleep(time.Second)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
}
This will print:
Default on 0
Default on 1
Default on 2
Default on 3
Default on 4
Closing
0
0
0
0
0
While I do understand that only 0s will get printed I do not really understand why the first send does still trigger the default branch of the select?
What is the logic behind the behavior of a select in this case?
Example at the Go Playground

You never send any values to stuff, you execute all the default cases before you get to any of the receive operations in the fmt.Println statements. The default case is taken immediately if there is no other operation than can proceed, which means that your loop will execute and return as quickly as possible.
You want to block the loop, so you don't need the default case. You don't need the close at the end either, because you're not relying on the closed channel unblocking a receive or breaking from a range clause.
stuff := make(chan int)
go func() {
for i := 0; i < 5; i++ {
select {
case stuff <- i:
fmt.Printf("Sent %v\n", i)
}
}
println("Closing")
}()
time.Sleep(time.Second)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
https://play.golang.org/p/k2rmRDP38f
Notice also that the last "Sent" and the "Closing" line aren't printed, because you have no other synchronization waiting for the goroutine to finish, however that doesn't effect the outcome of this example.

Since you're using a non-blocking 'send', the stuff <- i will really only be executed if there's a reader already waiting to read things on the channel, or if the channel has some buffer. If not, the 'send' would have to block.
Now since you have a time.Sleep(time.Second) before the print statements that read from the channel, there are no readers for the channel till after 1 second has passed. The goroutine on the other hand finishes executing within that time and doesn't send anything.
You're seeing all zeroes in the output because the fmt.Println(...) statements are reading from a closed channel.

Your first case isn't executing.
Here's what your program does:
Start a goroutine.
Attempt to send 0 through 4 on the channel, which all block, because there is nothing reading the channel, so fall through to the default.
Meanwhile, in the main goroutine, you're sleeping for one second...
Then after the second has elapsed, attempt to read from the channel, but it is closed, so you get 0 every time.
To get your desired behavior, you have two choices:
Use a buffered channel, which can hold all of the data you send:
stuff := make(chan int, 5)
Don't use default in your select statement, which will cause each send to wait until it can succeed.
Which is preferred depends on your goals. For a minimal example like this, either is probably no better or worse.

It only executes the default case, because the for loop runs 5 times before anything starts reading from the channel. Each time through, because nothing can read from the channel, it goes to the default case. If something could read from the channel, it would execute that case.

Understanding spark process behaviour

I would like to understand a process behavior. Basically this spark process must be create at most five files, one for each territory and save them into HDFS.
Territories are provided by an array of five strings. But when I'm looking at spark UI, I see many times the same action being executed.
These are my questions:
Why isEmpty action has been executed 4 times for each territory instead of one? I expect just one action for territory.
How are decided the tasks number when isEmpty is calculated? First time there is just one task, the second time tasks are 4, third are 20 and fourth are 35. Which the logic behind that sizing? Can I control that number in some way?
NOTE: is someone has a more say big data solution for to accomplish the same process goal, please suggest me.
This is the code excerpt for the Spark process:
class IntegrationStatusD1RequestProcess {
logger.info(s"Retrieving all measurement point from DB")
val allMPoints = registryData.createIncrementalRegistryByMPointID()
.setName("allMPoints")
.persist(StorageLevel.MEMORY_AND_DISK)
logger.info("getTerritories return always an array of five String")
intStatusHelper.getTerritories.foreach { territory =>
logger.info(s"Retrieving measurement point for territory $territory")
val intStatusesChanged = allMPoints
.filter { m => m.getmPoint.substring(0, 3) == territory }
.setName(s"intStatusesChanged_${territory}")
.persist(StorageLevel.MEMORY_AND_DISK)
intStatusesChanged.isEmpty match {
case true => logger.info(s"No changes detected for territory")
case false =>
//create file and save it into hdfs
}
}
}
This is the image showing all the spark jobs:
The following first two images showing isEmpty stages:

isEmpty is inefficient if you expect it to be true!
Here's the RDD code for isEmpty:
def isEmpty(): Boolean = withScope {
partitions.length == 0 || take(1).length == 0
}
It calls take. This is an efficient implementation if you think the RDD isn't empty, but is a horrible implementation if you think that it is.
The implementation of take follows this recursive step, starting at parts = 1:
Collect the first parts partitions.
Check if this result contain >= n items.
If yes, take the first n
If no, repeat step 1 with parts = parts * 4.
This implementation strategy lets the execution short-circuit if the RDD has more elements than you want to take, which is usually true. But if your RDD has fewer elements than you want to take, you end up computing the partition #1 log4(nPartitions) + 1 times, partitions #2-4 log4(nPartitions) times, partitions #5-16 log4(nPartitions) - 1 times, and so on.
A better implementation for this use case
This implementation only computes each partition once by sacrificing short-circuit capability:
def fasterIsEmpty(rdd: RDD[_]): Boolean = {
rdd.mapPartitions(it => Iterator(it.isEmpty))
.fold(true)(_ && _)
}

In Rx (or RxJava/RxScala), how to make an auto-resetting stateful latch map/filter for measuring in-stream elapsed time to touch a barrier?

Apologies if the question is poorly phrased, I'll do my best.
If I have a sequence of values with times as an Observable[(U,T)] where U is a value and T is a time-like type (or anything difference-able I suppose), how could I write an operator which is an auto-reset one-touch barrier, which is silent when abs(u_n - u_reset) < barrier, but spits out t_n - t_reset if the barrier is touched, at which point it also resets u_reset = u_n.
That is to say, the first value this operator receives becomes the baseline, and it emits nothing. Henceforth it monitors the values of the stream, and as soon as one of them is beyond the baseline value (above or below), it emits the elapsed time (measured by the timestamps of the events), and resets the baseline. These times then will be processed to form a high-frequency estimate of the volatility.
For reference, I am trying to write a volatility estimator outlined in http://www.amazon.com/Volatility-Trading-CD-ROM-Wiley/dp/0470181990 , where rather than measuring the standard deviation (deviations at regular homogeneous times), you repeatedly measure the time taken to breach a barrier for some fixed barrier amount.
Specifically, could this be written using existing operators? I'm a bit stuck on how the state would be reset, though maybe I need to make two nested operators, one which is one-shot and another which keeps creating that one-shot... I know it could be done by writing one by hand, but then I need to write my own publisher etc etc.
Thanks!

I don't fully understand the algorithm and your variables in the example, but you can use flatMap with some heap-state and return empty() or just() as needed:
int[] var1 = { 0 };
source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
If you need a per-sequence state because of multiple consumers, you can defer the whole thing:
Observable.defer(() -> {
int[] var1 = { 0 };
return source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
}
return Observable.empty();
});
}).subscribe(...);

Rx debouncing inputs

I need to debounce an input-stream.
At the first occurrence of state 1 I need to wait for 5 Seconds and verify if the laste state was also 1.
Only than I have a stable signal.
(time) 0-1-2-3-4-5-6-7-8-9
(state) 0-0-0-0-0-1-0-1-0-1
(result) -> 1
Here is an example of a non-stable signal.
(time) 0-1-2-3-4-5-6-7-8-9
(state) 0-0-0-0-0-1-0-1-0-0
(result) -> 0
I tried using a buffer, but a buffer has fixed starting point and I need to wait for 5 seconds starting with my first event.

Taking your requirements literally
At the first occurrence of state 1 I need to wait for 5 Seconds and
verify if the laste state was also 1. Only than I have a stable
signal.
I can come up with a few ways to solve this problem.
To clarify my assumptions, you just want to push the last value produced 5 seconds after the first occurrence of a 1. This will result in a single value sequence producing either a 0 or a 1 (ie. regardless of any further values produced past 5 seconds from the source sequence)
Here I recreate you sequence with some jiggery-pokery.
var source = Observable.Timer(TimeSpan.Zero,TimeSpan.FromSeconds(1))
.Take(10)
.Select(i=>{if(i==5 || i==7 || i==9){return 1;}else{return 0;}}); //Should produce 1;
//.Select(i=>{if(i==5 || i==7 ){return 1;}else{return 0;}}); //Should produce 0;
All of the options below look to share the sequence. To share a sequence safely in Rx we Publish() and connect it. I use automatic connecting via the RefCount() operator.
var sharedSource = source.Publish().RefCount();
1) In this solution we take the first value of 1, and then buffer the selected the values of the sequence in to buffer sizes of 5 seconds. We only take the first of these buffers. Once we get this buffer, we get the last value and push that. If the buffer is empty, I assume we push a one as the last value was the '1' that started the buffer from running.
sharedSource.Where(state=>state==1)
.Take(1)
.SelectMany(_=>sharedSource.Buffer(TimeSpan.FromSeconds(5)).Take(1))
.Select(buffer=>
{
if(buffer.Any())
{
return buffer.Last();
}
else{
return 1;
}
})
.Dump();
2) In this solution I take the approach to only start listening once we get a valid value (1) and then take all values until a timer triggers the termination. From here we take the last value produced.
var fromFirstValid = sharedSource.SkipWhile(state=>state==0);
fromFirstValid
.TakeUntil(
fromFirstValid.Take(1)
.SelectMany(_=>Observable.Timer(TimeSpan.FromSeconds(5))))
.TakeLast(1)
.Dump();
3) In this solution I use the window operator to create a single window that opens when the first value of '1' happens and then closes when 5 seconds elapses. Again we just take the last value
sharedSource.Window(
sharedSource.Where(state=>state==1),
_=>Observable.Timer(TimeSpan.FromSeconds(5)))
.SelectMany(window=>window.TakeLast(1))
.Take(1)
.Dump();
So lots of different ways to skin-a-cat.

It sounds (at a glance) like you want Throttle, not Buffer, although some more information on your use cases would help pin that down - at any rate, here's how you might Throttle your stream:
void Main()
{
var subject = new Subject<int>();
var source = subject.Publish().RefCount();
var query = source
// Start counting on a 1, wait 5 seconds, and take the last value
.Throttle(x => Observable.Timer(TimeSpan.FromSeconds(5)));
using(query.Subscribe(Console.WriteLine))
{
// This sequence should produce a one
subject.OnNext(1);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(1);
Console.ReadLine();
// This sequence should produce a zero
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(0);
subject.OnNext(1);
subject.OnNext(0);
Console.ReadLine();
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse