Select statement nuances - select

I read about select statements and its execution steps but I'm not fully understanding what's happening here.
I created two examples of a Fan-In function (from the Go Concurrency Patterns talk)
The first one:
select {
case value := <-g1:
c <- value
case value := <-g2:
c <- value
}
Prints from each channel as expected (each channel keeps its own counter):
Bob : 0
Alice: 0
Bob : 1
Alice: 1
Bob : 2
Alice: 2
Alice: 3
Alice: 4
Bob : 3
Alice: 5
The second one:
select {
case c <- <-g1:
case c <- <-g2:
}
It is randomly selecting a channel and discarding the other one's value:
Bob : 0
Alice: 1
Alice: 2
Alice: 3
Bob : 4
Alice: 5
Bob : 6
Alice: 7
Alice: 8
Bob : 9
Update: while writing this question, I thought the second select was equal to:
var v string
select {
case v = <-g1:
case v = <-g2:
c <- v
}
But I was wrong, because this one always prints from the second channel (as expected from a switch like statement because there isn't fallthrough in select statements):
Bob : 0
Bob : 1
Bob : 2
Bob : 3
Bob : 4
Bob : 5
Bob : 6
Bob : 7
Bob : 8
Bob : 9
Does someone understand why my second example creates a sequence?
Thanks,

Your second select statement is interpreted as:
v1 := <-g1
v2 := <-g2
select {
case c <- v1:
case c <- v2:
}
As described in the language spec, the RHS of each send operator will be evaluated up front when the statement is executed:
Execution of a "select" statement proceeds in several steps:
For all the cases in the statement, the channel operands of receive operations and the channel and right-hand-side expressions of send statements are evaluated exactly once, in source order, upon entering the "select" statement. The result is a set of channels to receive from or send to, and the corresponding values to send. Any side effects in that evaluation will occur irrespective of which (if any) communication operation is selected to proceed. Expressions on the left-hand side of a RecvStmt with a short variable declaration or assignment are not yet evaluated.
If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection. Otherwise, if there is a default case, that case is chosen. If there is no default case, the "select" statement blocks until at least one of the communications can proceed.
...
So as step (1) both <-g1 and <-g2 will be evaluated, receiving values from each channel. This may block if there is nothing to receive yet.
At (2), we wait until c is ready to send a value and then randomly choose a branch of the select statement to execute: since they are both waiting on the same channel, they are both ready to proceed.
This explains the behaviour you saw where values are dropped and you got non-deterministic behaviour in which value was sent to c.
If you want to wait on g1 and g2, you will need to use the first form for your statement as you've discovered.

According to [http://golang.org/ref/spec] The Go Programming Language Specification
for { // send random sequence of bits to c
select {
case c <- 0: // note: no statement, no fallthrough, no folding of cases
case c <- 1:
}
}
It'll generate 0 or 1 randomly.
For the second exaple
select {
case c <- <-g1:
case c <- <-g2:
}
when g1 has Bob : 0 and g2 has Alice: 0, either c <- <-g1 or c <- <-g2 will execute , but only one will.
That explain why you have the sequence 0 1 2 3 4 5 6 7 8 9 but not 0 0 1 1 2 2 3 3 4 4
And it also say:
in source order, upon entering the "select" statement. The result is a set of channels to receive from or send to, and the corresponding values to send.
According to my comprehension , even if c <- <-g1 will execute , Alice: 0 also will pop up from g2. So every time you have Bob : i and Alice: i, only one will be printed out.

Related

Processing each row in kdb table and appending arbitrary results in a new table

I have a table
t:([]a:`a`b`c;b:1 2 3;c:`x`y`z)
I would like to iterate and process each row.
The thing is that the processing logic for each row may result in arbitrary lines of data, after the full iteration the result maybe as such e.g.
results:([]a:`a1`b1`b2`b3`c1`c2;x:1 2 2 2 3 3)
I have the following idea so far but doesn't seem to work:
uj { // some processing function } each t
But how does one return arbitrary number of data append the results into a new table?
Assuming you are using something from the table entries to indicate your arbitrary value, you can use a dictionary to indicate a number (or a function) which can be used to apply these values.
In this example, I use the c column of the original table to indicate the number of rows to return (and the number from 1 to count to).
As each entry of the table is a dictionary, I can index using the column names to get the values and build a new table.
I also use raze to join each of the results together, as they will each have the same schema.
raze {[x]
d:`x`y`z!1 3 2;
([]a:((),`$string[x[`a]],/:string 1+til d[x[`c]]);x:((),d[x[`c]])#x[`b])
} each t
Not sure if this is what you want, but you can try something like this:
ungroup select a:`${y,/:x}[string b]'[string a],b from t
Or you can use accumulators if you need the result of the previous row calculations like this:
{y[`b]+:last[x]`b;x,y}/[t;t]
If your processing function is outputting tables that conform, just raze should suffice:
raze {y#enlist x}'[t;1 3 2]
a b c
-----
a 1 x
b 2 y
b 2 y
b 2 y
c 3 z
c 3 z
Otherwise use (uj/)
(uj/) {y#enlist x}'[t;1 3 2]
a b c
-----
a 1 x
b 2 y
b 2 y
b 2 y
c 3 z
c 3 z
Your best answer will depend very much on how you want to use the results computed from each row of t. It might suit you to normalise t; it might not. The key point here:
A table cell can be any q data structure.
The minimum you can do in this regard is to store the result of your processing function in a new column.
Below, an arbitrary binary function f returns its result as a dictionary.
q)f:{n:1+rand 3;(`$string[x],/:"123" til n)!n#y}
q)f [`a;2]
a1| 2
a2| 2
q)update d:a f'b from t
a b c d
---------------------
a 1 x `a1`a2`a3!1 1 1
b 2 y (,`b1)!,2
c 3 z `c1`c2!3 3
But its result could be any q data structure.
You were considering a unary processing function:
q)pf:{#[x;`d;:;] f . x`a`b}
q)pf each t
a b c d
---------------------
a 1 x `a1`a2`a3!1 1 1
b 2 y `b1`b2!2 2
c 3 z `c1`c2`c3!3 3 3
You might find other suggestions at KX Community.
If I understand correctly your question you need something like this :
(uj/){}each t
Check this bit :
(uj/)enlist[t],{x:update x:i from?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]}each t
This part :
x:update x:i from
// functional form of a function that takes random rows/columns
?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];
// some for of if-else and an update to generate column a (not bullet proof)
{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]
Basically the above gives something like :
q){x:update x:i from?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]}each t
+`a`b`c`x!(`a0`a1`a2`a3`a4`a5`a6`a7;1 1 1 1 1 1 1 1;`x`x`x`x`x`x`x`x;0 1 2 3 ..
+`a`x!(`a0`a1`a2`a3`a4`a5;0 1 2 3 4 5)
+`a`b`c`x!(`a0`a1`a2;1 1 1;`x`x`x;0 1 2)
+`a`b`c`x!(`a0`a1`a2`a3`a4`a5`a6`a7`a8`a9`a10`a11;1 1 1 1 1 1 1 1 1 1 1 1;`x`..
or taking the first one :
q)first{x:update x:i from?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]}each t
a b x
--------
a0 1 0
a1 1 1
a2 1 2
a3 1 3
a4 1 4
a5 1 5
a6 1 6
a7 1 7
a8 1 8
a9 1 9
a10 1 10
You can do
(uj/)enist[t],{ // some function }each t
to get what you want. Drop the enlist[t] if you don't want the table you start with in your result
Hope this helps.

iter function over table as input - does order matter and why?

I'm totally new to kdb+/q, and I found this problem below quite confusing to me. Just to simplify, we say we have this one line function f returns an one-row table with preset values, and I want to run this function over a combination of inputs x and y, like dates (list) and metas (table, with columns like orderid, px, size etc).
Now, I listed two ways to do so below. Since the function f doesn't really use any of the input, I would suppose the order of x and y doesn't matter since the difference is just which one is passed to f before another and only when two inputs passed would f starts to operate.
But why I got error in the second way, i.e. table follows the list?
Any idea and explanation is much appreciated.
f: {[x;y]
([] m: enlist `M; n: enlist `N)
};
x: 1 2 3;
y: ([] a: 4 5 6; b: 7 8 9);
raze raze f ' [y] ' [x]; // this one works
raze raze f ' [x] ' [y]; // this one gives ERROR: length Explanation: Arguments do not conform
What you're doing is effectively equivalent to:
f:{y;1};
q)(f'[([]a:1 2 3;b:4 5 3)])#/:1 2 3
1 1 1
1 1 1
1 1 1
(using extra brackets to make it clear the order of operation).
In this situation each one reduces to
q)f'[([]a:1 2 3;b:4 5 3);1]
1 1 1
q)f'[([]a:1 2 3;b:4 5 3);2]
1 1 1
q)f'[([]a:1 2 3;b:4 5 3);3]
1 1 1
The "length" is ok here because the "y" values are atomic and kdb automatically expands those atomic values to match the length of the table. In order words, kdb treats these as:
q)f'[([]a:1 2 3;b:4 5 3);1 1 1]
1 1 1
q)f'[([]a:1 2 3;b:4 5 3);2 2 2]
1 1 1
q)f'[([]a:1 2 3;b:4 5 3);3 3 3]
1 1 1
However, when you change the order it becomes:
(f'[1 2 3])#/:([]a:1 2 3;b:4 5 3)
which is equivalent to:
f'[1 2 3;`a`b!1 4]
f'[1 2 3;`a`b!2 5]
f'[1 2 3;`a`b!3 3]
but now you do have a length problem because the dictionaries in the "y" variable are not atomic, they have length 2. Which doesn't match the length of the list (3).
You don’t say so but it looks like you are studying how to iterate a binary function f over list arguments, which has brought you to projecting f' onto x, which gives you a unary f'[x] that you then iterate over y. If that’s how we got here, what you want might be as simple as x f'y, which iterates f over corresponding items in x and y.
However, you mention combinations of inputs. If you want effectively a Cartesian product based on f, then combine the iterators Each Right and Each Left to get x f:/:\:y.
That returns a matrix. You have razed your result. Depending on your argument types, you might be able to use cross to generate all the argument pair combinations, and Apply Each .' to apply f to each pair:
f .' x cross y

Understanding how to read each-right and each-left combined in kdb

From q for mortals, i'm struggling to understand how to read this, and understand it logically.
1 2 3,/:\:10 20
I understand the result is a cross product when in full form: raze 1 2 3,/:\:10 20.
But reading from left to right, I'm currently lost at understanding what this yields (in my head)
\:10 20
combined with 1 2 3,/: ??
Help in understanding how to read this clearly (in words or clear logic) would be appreciated.
I found myself saying the following in my head whilst I program the syntax in q. q works from right to left.
Internal Monologue -> Join the string on the right onto each of the strings on the left
code -> "ABC",\:"-D"
result -> "A-D"
"B-D"
"C-D"
I think that's an easy way to understand it. 'join' can be replaced with whatever...
Internal Monologue -> Does the string on the right match any of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:"CAT"
result -> 0010b
Each-right is the same concept and combining them is straightforward also;
Internal Monologue -> Does each of the strings on the right match each of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:/:("CAT";"Dog")
result -> 0010b
0100b
So in your example 1 2 3,/:\:10 20 - you're saying 'Join each of the elements on the right to each of the elements on the left'
Hope this helps!!
EDIT To add a real world example.... - consider the following table
q)show tab:([] upper syms:10?`2; names:10?("Robert";"John";"Peter";"Jenny"); amount:10?til 10)
syms names amount
--------------------
CF "Peter" 8
BP "Robert" 1
IC "John" 9
IN "John" 5
NM "Peter" 4
OJ "Jenny" 6
BJ "Robert" 6
KH "John" 1
HJ "Peter" 8
LH "John" 5
q)
I you want to get all records where the name is Robert, you can do; select from tab where names like "Robert"
But if you want to get the results where the name is either Robert or John, then it is a perfect scenario to use our each-left and each-right.
Consider the names column - it's a list of strings (a list where each element is a list of chars). What we want to ask is 'does any of the strings in the names column match any of the strings we want to find'... that translates to (namesList)~\:/:(list;of;names;to;find). Here's the steps;
q)(tab`names)~\:/:("Robert";"John")
0100001000b
0011000101b
From that result we want a compiled list of booleans where each element is true of it is true for Robert OR John - for example, if you look at index 1 of both lists, it's 1b for Robert and 0b for John - in our result, the value at index 1 should be 1b. Index 2 should be 1b, index3 should be 1b, index4 should be 0b etc... To do this, we can apply the any function (or max or sum!). The result is then;
q)any(tab`names)~\:/:("Robert";"John")
0111001101b
Putting it all together, we get;
q)select from tab where any names~\:/:("Robert";"John")
syms names amount
--------------------
BP "Robert" 1
IC "John" 9
IN "John" 5
BJ "Robert" 6
KH "John" 1
LH "John" 5
q)
Firstly, q is executed (and hence generally read) right to left. This means that it's interpreting the \: as a modifier to be applied to the previous function, which itself is a simple join modified by the /: adverb. So the way to read this is "Apply join each-right to each of the left-hand arguments."
In this case, you're applying the two adverbs to the join - \:10 20 on its own has no real meaning here.
I find it helpful to also look at the converse case 1 2 3,\:/:10 20, running that code produces a 2x6 matrix, which I'd describe more like "apply join each-left to each of the right hand arguments" ... I hope that makes sense.
An alternative syntax which also might help is ,/:\:[1 2 3;10 20] - this might be useful as it makes it very clear what the function you're applying is, and is equivalent to your in-place notation.

Joining multiple times in kdb

I have two tables
table 1 (orders) columns: (date,symbol,qty)
table 2 (marketData) columns: (date,symbol,close price)
I want to add the close for T+0 to T+5 to table 1.
{[nday]
value "temp0::update date",string[nday],":mdDates[DateInd+",string[nday],"] from orders";
value "temp::temp0 lj 2! select date",string[nday],":date,sym,close",string[nday],":close from marketData";
table1::temp
} each (1+til 5)
I'm sure there is a better way to do this, but I get a 'loop error when I try to run this function. Any suggestions?
See here for common errors. Your loop error is because you're setting views with value, not globals. Inside a function value evaluates as if it's outside the function so you don't need the ::.
That said there's lots of room for improvement, here's a few pointers.
You don't need the value at all in your case. E.g. this line:
First line can be reduced to (I'm assuming mdDates is some kind of function you're just dropping in to work out the date from an integer, and DateInd some kind of global):
{[nday]
temp0:update date:mdDates[nday;DateInd] from orders;
....
} each (1+til 5)
In this bit it just looks like you're trying to append something to the column name:
select date",string[nday],":date
Remember that tables are flipped dictionaries... you can mess with their column names via the keys, as illustrated (very noddily) below:
q)t:flip `a`b!(1 2; 3 4)
q)t
a b
---
1 3
2 4
q)flip ((`$"a","1"),`b)!(t`a;t`b)
a1 b
----
1 3
2 4
You can also use functional select, which is much neater IMO:
q)?[t;();0b;((`$"a","1"),`b)!(`a`b)]
a1 b
----
1 3
2 4
Seems like you wanted to have p0 to p5 columns with prices corresponding to date+0 to date+5 dates.
Using adverb over to iterate over 0 to 5 days :
q)orders:([] date:(2018.01.01+til 5); sym:5?`A`G; qty:5?10)
q)data:([] date:20#(2018.01.01+til 10); sym:raze 10#'`A`G; price:20?10+10.)
q)delete d from {c:`$"p",string[y]; (update d:date+y from x) lj 2!(`d`sym,c )xcol 0!data}/[ orders;0 1 2 3 4]
date sym qty p0 p1 p2 p3 p4
---------------------------------------------------------------
2018.01.01 A 0 10.08094 6.027448 6.045174 18.11676 1.919615
2018.01.02 G 3 13.1917 8.515314 19.018 19.18736 6.64622
2018.01.03 A 2 6.045174 18.11676 1.919615 14.27323 2.255483
2018.01.04 A 7 18.11676 1.919615 14.27323 2.255483 2.352626
2018.01.05 G 0 19.18736 6.64622 11.16619 2.437314 4.698096

Change the class of columns in a data frame

First of all, excuse me if I do any mistakes, but English is not a language I use very often.
I have a data frame with numbers. A small part of the data frame is this:
nominal 2
2
2
2
ordinal
2
1
1
2
So, I want to use the gower distance function on these numbers.
Here ( http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/StatMatch/man/gower.dist.html ) says that in order to use gower.dist, all nominal variables must be of class "factor" and all ordinal variables of class "ordered".
By default, all the columns are of class "integer" and mode "numeric". In order to change the class of the columns, i use these commands:
DF=read.table("clipboard",header=TRUE,sep="\t")
# I select all the cells and I copy them to the clipboard.
#Then R, with this command, reads the data from there.
MyHeader=names(DF) # I save the headers of the data frame to a temp matrix
for (i in 1:length(DF)) {
if (MyHeader[[i]]=="nominal") DF[[i]]=as.factor(DF[[i]])
}
for (i in 1:length(DF)) {
if (MyHeader[[i]]=="ordinal") DF[[i]]=as.ordered(DF[[i]])
}
The first for/if loop changes the class from integer to factor, which is what I want, but the second changes the class of ordinal variables to: "ordered" "factor".
I need to change all the columns with the header "ordinal" to "ordered", as the gower.dist function says.
Thanks in advance,
B.T.
What you are doing is fine --- if perhaps a little inelegantly.
With your ordered factor, you have something like:
> foo <- as.ordered(1:10)
> foo
[1] 1 2 3 4 5 6 7 8 9 10
Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10
> class(foo)
[1] "ordered" "factor"
Notice that it has two classes, indicating that it is an ordered factor and that is is a factor:
> is.ordered(as.ordered(1:10))
[1] TRUE
> is.factor(as.ordered(1:10))
[1] TRUE
In some senses, you might like to think that foo is an ordered factor but also inherits from the factor class too. Alternatively, if there isn't a specific method that handles ordered factors, but there is a method for factors, R will use the factor method. As far as R is concerned, an ordered factor is an object with classes "ordered" and "factor". This is what your function for Gower's distance will require.
You could easily do this with:
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)
which gives you a dataframe with the correct structure. If you work with data frames, please stay away from [[]] unless you know very well what you're doing. Take Dirks advice, and check Owen's R Guide as well. You definitely need it.
If i do the conversion as I showed above, gower.dist() works perfectly fine. On a sidenote, the gowers distance can easily be calculated using the daisy() function as well:
DF <- data.frame(
ordinal= c(1,2,3,1,2,1),
nominal= c(2,2,2,2,2,2)
)
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)
library(cluster)
daisy(DF,metric="gower")
library(StatMatch)
gower.dist(DF)