select but without wait (POSIX) - select

I have the typical setup for file io which works well with select like:
int retval = select(maxfd +1 , &read_set, &write_set, &error_set, 0); // timeout==0 -> endless
But now I have a situation where I want to loop and check on every cycle if one of the file selectors become ready. I do not want to start a separate thread for that! Is there something in posix/linux which can be used, hopefully with the same FD_SET like data structures which checks for file state without waiting for them?
Yes, I can set timeout for select to a minimal value, but I hope it can be done without that.

POSIX says:
To effect a poll, the timeout parameter should not be a null pointer, and should point to a zero-valued timespec structure.
So for your application, it should be sufficient to call select like this:
struct timeval zero = { 0, 0 };
int retval = select(maxfd +1 , &read_set, &write_set, &error_set, &zero);

Related

Transferring arrays/classes/records between locales

In a typical N-Body simulation, at the end of each epoch, each locale would need to share its own portion of the world (i.e. all bodies) to the rest of the locales. I am working on this with a local-view approach (i.e. using on Loc statements). I encountered some strange behaviours that I couldn't make sense out of, so I decided to make a test program, in which things got more complicated. Here's the code to replicate the experiment.
proc log(args...?n) {
writeln("[locale = ", here.id, "] [", datetime.now(), "] => ", args);
}
const max: int = 50000;
record stuff {
var x1: int;
var x2: int;
proc init() {
this.x1 = here.id;
this.x2 = here.id;
}
}
class ctuff {
var x1: int;
var x2: int;
proc init() {
this.x1 = here.id;
this.x2 = here.id;
}
}
class wrapper {
// The point is that total size (in bytes) of data in `r`, `c` and `a` are the same here, because the record and the class hold two ints per index.
var r: [{1..max / 2}] stuff;
var c: [{1..max / 2}] owned ctuff?;
var a: [{1..max}] int;
proc init() {
this.a = here.id;
}
}
proc test() {
var wrappers: [LocaleSpace] owned wrapper?;
coforall loc in LocaleSpace {
on Locales[loc] {
wrappers[loc] = new owned wrapper();
}
}
// rest of the experiment further down.
}
Two interesting behaviours happen here.
1. Moving data
Now, each instance of wrapper in array wrappers should live in its locale. Specifically, the references (wrappers) will live in locale 0, but the internal data (r, c, a) should live in the respective locale. So we try to move some from locale 1 to locale 3, as such:
on Locales[3] {
var timer: Timer;
timer.start();
var local_stuff = wrappers[1]!.r;
timer.stop();
log("get r from 1", timer.elapsed());
log(local_stuff);
}
on Locales[3] {
var timer: Timer;
timer.start();
var local_c = wrappers[1]!.c;
timer.stop();
log("get c from 1", timer.elapsed());
}
on Locales[3] {
var timer: Timer;
timer.start();
var local_a = wrappers[1]!.a;
timer.stop();
log("get a from 1", timer.elapsed());
}
Surprisingly, my timings show that
Regardless of the size (const max), the time of sending the array and record strays constant, which doesn't make sense to me. I even checked with chplvis, and the size of GET actually increases, but the time stays the same.
The time to send the class field increases with time, which makes sense, but it is quite slow and I don't know which case to trust here.
2. Querying the locales directly.
To demystify the problem, I also query the .locale.id of some variables directly. First, we query the data, which we expect to live in locale 2, from locale 2:
on Locales[2] {
var wrappers_ref = wrappers[2]!; // This is always 1 GET from 0, okay.
log("array",
wrappers_ref.a.locale.id,
wrappers_ref.a[1].locale.id
);
log("record",
wrappers_ref.r.locale.id,
wrappers_ref.r[1].locale.id,
wrappers_ref.r[1].x1.locale.id,
);
log("class",
wrappers_ref.c.locale.id,
wrappers_ref.c[1]!.locale.id,
wrappers_ref.c[1]!.x1.locale.id
);
}
And the result is:
[locale = 2] [2020-12-26T19:36:26.834472] => (array, 2, 2)
[locale = 2] [2020-12-26T19:36:26.894779] => (record, 2, 2, 2)
[locale = 2] [2020-12-26T19:36:27.023112] => (class, 2, 2, 2)
Which is expected. Yet, if we query the locale of the same data on locale 1, then we get:
[locale = 1] [2020-12-26T19:34:28.509624] => (array, 2, 2)
[locale = 1] [2020-12-26T19:34:28.574125] => (record, 2, 2, 1)
[locale = 1] [2020-12-26T19:34:28.700481] => (class, 2, 2, 2)
Implying that wrappers_ref.r[1].x1.locale.id lives in locale 1, even though it should clearly be on locale 2. My only guess is that by the time .locale.id is executed, the data (i.e. the .x of the record) is already moved to the querying locale (1).
So all in all, the second part of the experiment lead to a secondary question, whilst not answering the first part.
NOTE: all experiment are run with -nl 4 in chapel/chapel-gasnet docker image.
Good observations, let me see if I can shed some light.
As an initial note, any timings taken with the gasnet Docker image should be taken with a grain of salt since that image simulates the execution across multiple nodes using your local system rather than running each locale on its own compute node as intended in Chapel. As a result, it is useful for developing distributed memory programs, but the performance characteristics are likely to be very different than running on an actual cluster or supercomputer. That said, it can still be useful for getting coarse timings (e.g., your "this is taking a much longer time" observation) or for counting communications using chplvis or the CommDiagnostics module.
With respect to your observations about timings, I also observe that the array-of-class case is much slower, and I believe I can explain some of the behaviors:
First, it's important to understand that any cross-node communications can be characterized using a formula like alpha + beta*length. Think of alpha as representing the basic cost of performing the communication, independent of length. This represents the cost of calling down through the software stack to get to the network, putting the data on the wire, receiving it on the other side, and getting it back up through the software stack to the application there. The precise value of alpha will depend on factors like the type of communication, choice of software stack, and physical hardware. Meanwhile, think of beta as representing the per-byte cost of the communication where, as you intuit, longer messages necessarily cost more because there's more data to put on the wire, or potentially to buffer or copy, depending on how the communication is implemented.
In my experience, the value of alpha typically dominates beta for most system configurations. That's not to say that it's free to do longer data transfers, but that the variance in execution time tends to be much smaller for longer vs. shorter transfers than it is for performing a single transfer versus many. As a result, when choosing between performing one transfer of n elements vs. n transfers of 1 element, you'll almost always want the former.
To investigate your timings, I bracketed your timed code portions with calls to the CommDiagnostics module as follows:
resetCommDiagnostics();
startCommDiagnostics();
...code to time here...
stopCommDiagnostics();
printCommDiagnosticsTable();
and found, as you did with chplvis, that the number of communications required to localize the array of records or array of ints was constant as I varied max, for example:
locale
get
execute_on
0
0
0
1
0
0
2
0
0
3
21
1
This is consistent with what I'd expect from the implementation: That for an array of value types, we perform a fixed number of communications to access array meta-data, and then communicate the array elements themselves in a single data transfer to amortize the overheads (avoid paying multiple alpha costs).
In contrast, I found that the number of communications for localizing the array of classes was proportional to the size of the array. For example, for the default value of 50,000 for max, I saw:
locale
get
put
execute_on
0
0
0
0
1
0
0
0
2
0
0
0
3
25040
25000
1
I believe the reason for this distinction relates to the fact that c is an array of owned classes, in which only a single class variable can "own" a given ctuff object at a time. As a result, when copying the elements of array c from one locale to another, you're not just copying raw data, as with the record and integer cases, but also performing an ownership transfer per element. This essentially requires setting the remote value to nil after copying its value to the local class variable. In our current implementation, this seems to be done using a remote get to copy the remote class value to the local one, followed by a remote put to set the remote value to nil, hence, we have a get and put per array element, resulting in O(n) communications rather than O(1) as in the previous cases. With additional effort, we could potentially have the compiler optimize this case, though I believe it will always be more expensive than the others due to the need to perform the ownership transfer.
I tested the hypothesis that owned classes were resulting in the additional overhead by changing your ctuff objects from being owned to unmanaged, which removes any ownership semantics from the implementation. When I do this, I see a constant number of communications, as in the value cases:
locale
get
execute_on
0
0
0
1
0
0
2
0
0
3
21
1
I believe this represents the fact that once the language has no need to manage the ownership of the class variables, it can simply transfer their pointer values in a single transfer again.
Beyond these performance notes, it's important to understand a key semantic difference between classes and records when choosing which to use. A class object is allocated on the heap, and a class variable is essentially a reference or pointer to that object. Thus, when a class variable is copied from one locale to another, only the pointer is copied, and the original object remains where it was (for better or worse). In contrast, a record variable represents the object itself, and can be thought of as being allocated "in place" (e.g., on the stack for a local variable). When a record variable is copied from one locale to the other, it's the object itself (i.e., the record's fields' values) which are copied, resulting in a new copy of the object itself. See this SO question for further details.
Moving on to your second observation, I believe that your interpretation is correct, and that this may be a bug in the implementation (I need to stew on it a bit more to be confident). Specifically, I think you're correct that what's happening is that wrappers_ref.r[1].x1 is being evaluated, with the result being stored in a local variable, and that the .locale.id query is being applied to the local variable storing the result rather than the original field. I tested this theory by taking a ref to the field and then printing locale.id of that ref, as follows:
ref x1loc = wrappers_ref.r[1].x1;
...wrappers_ref.c[1]!.x1.locale.id...
and that seemed to give the right result. I also looked at the generated code which seemed to indicate that our theories were correct. I don't believe that the implementation should behave this way, but need to think about it a bit more before being confident. If you'd like to open a bug against this on Chapel's GitHub issues page, for further discussion there, we'd appreciate that.

Why does this select always run the default case when the first case actually is executed?

I'm trying to get a better understanding of golang channels. While reading this article I'm toying with non-blocking sends and have come up with the following code:
package main
import (
"fmt"
"time"
)
func main() {
stuff := make(chan int)
go func(){
for i := 0; i < 5; i ++{
select {
case stuff <- i:
fmt.Printf("Sent %v\n", i)
default:
fmt.Printf("Default on %v\n", i)
}
}
println("Closing")
close(stuff)
}()
time.Sleep(time.Second)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
}
This will print:
Default on 0
Default on 1
Default on 2
Default on 3
Default on 4
Closing
0
0
0
0
0
While I do understand that only 0s will get printed I do not really understand why the first send does still trigger the default branch of the select?
What is the logic behind the behavior of a select in this case?
Example at the Go Playground
You never send any values to stuff, you execute all the default cases before you get to any of the receive operations in the fmt.Println statements. The default case is taken immediately if there is no other operation than can proceed, which means that your loop will execute and return as quickly as possible.
You want to block the loop, so you don't need the default case. You don't need the close at the end either, because you're not relying on the closed channel unblocking a receive or breaking from a range clause.
stuff := make(chan int)
go func() {
for i := 0; i < 5; i++ {
select {
case stuff <- i:
fmt.Printf("Sent %v\n", i)
}
}
println("Closing")
}()
time.Sleep(time.Second)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
fmt.Println(<-stuff)
https://play.golang.org/p/k2rmRDP38f
Notice also that the last "Sent" and the "Closing" line aren't printed, because you have no other synchronization waiting for the goroutine to finish, however that doesn't effect the outcome of this example.
Since you're using a non-blocking 'send', the stuff <- i will really only be executed if there's a reader already waiting to read things on the channel, or if the channel has some buffer. If not, the 'send' would have to block.
Now since you have a time.Sleep(time.Second) before the print statements that read from the channel, there are no readers for the channel till after 1 second has passed. The goroutine on the other hand finishes executing within that time and doesn't send anything.
You're seeing all zeroes in the output because the fmt.Println(...) statements are reading from a closed channel.
Your first case isn't executing.
Here's what your program does:
Start a goroutine.
Attempt to send 0 through 4 on the channel, which all block, because there is nothing reading the channel, so fall through to the default.
Meanwhile, in the main goroutine, you're sleeping for one second...
Then after the second has elapsed, attempt to read from the channel, but it is closed, so you get 0 every time.
To get your desired behavior, you have two choices:
Use a buffered channel, which can hold all of the data you send:
stuff := make(chan int, 5)
Don't use default in your select statement, which will cause each send to wait until it can succeed.
Which is preferred depends on your goals. For a minimal example like this, either is probably no better or worse.
It only executes the default case, because the for loop runs 5 times before anything starts reading from the channel. Each time through, because nothing can read from the channel, it goes to the default case. If something could read from the channel, it would execute that case.

"Appending" to an ArraySlice?

Say ...
you have about 20 Thing
very often, you do a complex calculation running through a loop of say 1000 items. The end result is a varying number around 20 each time
you don't know how many there will be until you run through the whole loop
you then want to quickly (and of course elegantly!) access the result set in many places
for performance reasons you don't want to just make a new array each time. note that unfortunately there's a differing amount so you can't just reuse the same array trivially.
What about ...
var thingsBacking = [Thing](repeating: Thing(), count: 100) // hard limit!
var things: ArraySlice<Thing> = []
func fatCalculation() {
var pin: Int = 0
// happily, no need to clean-out thingsBacking
for c in .. some huge loop {
... only some of the items (roughly 20 say) become the result
x = .. one of the result items
thingsBacking[pin] = Thing(... x, y, z )
pin += 1
}
// and then, magic of slices ...
things = thingsBacking[0..<pin]
(Then, you can do this anywhere... for t in things { .. } )
What I am wondering, is there a way you can call to an ArraySlice<Thing> to do that in one step - to "append to" an ArraySlice and avoid having to bother setting the length at the end?
So, something like this ..
things = ... set it to zero length
things.quasiAppend(x)
things.quasiAppend(x2)
things.quasiAppend(x3)
With no further effort, things now has a length of three and indeed the three items are already in the backing array.
I'm particularly interested in performance here (unusually!)
Another approach,
var thingsBacking = [Thing?](repeating: Thing(), count: 100) // hard limit!
and just set the first one after your data to nil as an end-marker. Again, you don't have to waste time zeroing. But the end marker is a nuisance.
Is there a more better way to solve this particular type of array-performance problem?
Based on MartinR's comments, it would seem that for the problem
the data points are incoming and
you don't know how many there will be until the last one (always less than a limit) and
you're having to redo the whole thing at high Hz
It would seem to be best to just:
(1) set up the array
var ra = [Thing](repeating: Thing(), count: 100) // hard limit!
(2) at the start of each run,
.removeAll(keepingCapacity: true)
(3) just go ahead and .append each one.
(4) you don't have to especially mark the end or set a length once finished.
It seems it will indeed then use the same array backing. And it of course "increases the length" as it were each time you append - and you can iterate happily at any time.
Slices - get lost!

Counter in contiki

I'm trying to program Zolertia z1 node in Contiki, i need counter to go from 0 to 120, etimer should be set to 1 second delay(etimer_set(&et, CLOCK_SECOND)), and when i try to do counting it's constantly printing out same number (0 or 1), i think i should use PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et)) and probably etimer_restart,and after each second counter should be incremented and printed out (1, 2 ,3 ...), but obviously I'm not doing something correct in while loop or functions are not good?
This code works for me:
PROCESS_THREAD(hello_world_process, ev, data)
{
static struct etimer et;
static int counter;
PROCESS_BEGIN();
etimer_set(&et, CLOCK_SECOND);
while(1) {
PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));
printf("timer called, counter=%u\n", counter++);
etimer_reset(&et);
}
PROCESS_END();
}
Potential pitfalls:
there is no process-local storage for in-process variables in Contiki
processes. Meaning - if you want to save the values of local
variables across yields (such as PROCESS_WAIT_EVENT_UNTIL), declare
them as static. Most likely this is the problem you're facing, as it woul lead to the counter value being reset.
etimer_restart will drift, use etimer_reset instead to get the duration of exactly 120 seconds.

Help with live-updating sound on the iPhone

My question is a little tricky, and I'm not exactly experienced (I might get some terms wrong), so here goes.
I'm declaring an instance of an object called "Singer". The instance is called "singer1". "singer1" produces an audio signal. Now, the following is the code where the specifics of the audio signal are determined:
OSStatus playbackCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
//Singer *me = (Singer *)inRefCon;
static int phase = 0;
for(UInt32 i = 0; i < ioData->mNumberBuffers; i++) {
int samples = ioData->mBuffers[i].mDataByteSize / sizeof(SInt16);
SInt16 values[samples];
float waves;
float volume=.5;
for(int j = 0; j < samples; j++) {
waves = 0;
waves += sin(kWaveform * 600 * phase)*volume;
waves += sin(kWaveform * 400 * phase)*volume;
waves += sin(kWaveform * 200 * phase)*volume;
waves += sin(kWaveform * 100 * phase)*volume;
waves *= 32500 / 4; // <--------- make sure to divide by how many waves you're stacking
values[j] = (SInt16)waves;
values[j] += values[j]<<16;
phase++;
}
memcpy(ioData->mBuffers[i].mData, values, samples * sizeof(SInt16));
}
return noErr;
}
99% of this is borrowed code, so I only have a basic understanding of how it works (I don't know about the OSStatus class or method or whatever this is. However, you see those 4 lines with 600, 400, 200 and 100 in them? Those determine the frequency. Now, what I want to do (for now) is insert my own variable in there in place of a constant, which I can change on a whim. This variable is called "fr1". "fr1" is declared in the header file, but if I try to compile I get an error about "fr1" being undeclared. Currently, my technique to fix this is the following: right beneath where I #import stuff, I add the line
fr1=0.0;//any number will work properly
This sort of works, as the code will compile and singer1.fr1 will actually change values if I tell it to. The problems are now this:A)even though this compiles and the tone specified will play (0.0 is no tone), I get the warnings "Data definition has no type or storage class" and "Type defaults to 'int' in declaration of 'fr1'". I bet this is because for some reason it's not seeing my previous declaration in the header file (as a float). However, again, if I leave this line out the code won't compile because "fr1 is undeclared". B)Just because I change the value of fr1 doesn't mean that singer1 will update the value stored inside the "playbackcallback" variable or whatever is in charge of updating the output buffers. Perhaps this can be fixed by coding differently? C)even if this did work, there is still a noticeable "gap" when pausing/playing the audio, which I need to eliminate. This might mean a complete overhaul of the code so that I can "dynamically" insert new values without disrupting anything. However, the reason I'm going through all this effort to post is because this method does exactly what I want (I can compute a value mathematically and it goes straight to the DAC, which means I can use it in the future to make triangle, square, etc waves easily). I have uploaded Singer.h and .m to pastebin for your veiwing pleasure, perhaps they will help. Sorry, I can't post 2 HTML tags so here are the full links.
(http://pastebin.com/ewhKW2Tk)
(http://pastebin.com/CNAT4gFv)
So, TL;DR, all I really want to do is be able to define the current equation/value of the 4 waves and re-define them very often without a gap in the sound.
Thanks. (And sorry if the post was confusing or got off track, which I'm pretty sure it did.)
My understanding is that your callback function is called every time the buffer needs to be re-filled. So changing fr1..fr4 will alter the waveform, but only when the buffer updates. You shouldn't need to stop and re-start the sound to get a change, but you will notice an abrupt shift in the timbre if you change your fr values. In order to get a smooth transition in timbre, you'd have to implement something that smoothly changes the fr values over time. Tweaking the buffer size will give you some control over how responsive the sound is to your changing fr values.
Your issue with fr being undefined is due to your callback being a straight c function. Your fr variables are declared as objective-c instance variables as part of your Singer object. They are not accessible by default.
take a look at this project, and see how he implements access to his instance variables from within his callback. Basically he passes a reference to his instance to the callback function, and then accesses instance variables through that.
https://github.com/youpy/dowoscillator
notice:
Sinewave *sineObject = inRefCon;
float freq = sineObject.frequency * 2 * M_PI / samplingRate;
and:
AURenderCallbackStruct input;
input.inputProc = RenderCallback;
input.inputProcRefCon = self;
Also, you'll want to move your callback function outside of your #implementation block, because it's not actually part of your Singer object.
You can see this all in action here: https://github.com/coryalder/SineWaver