kdb - Simplifying a query with maps and lj - kdb

I have few tables already loaded into memory
//the real table is huge
st:([] s:`a`a`a`b`b`b;n:3 5 7 3 5 7; v:`U20`U30`U50`U22`U33`U44)
//step function
st1:`s#select first v by s,n from st
//mapping function
f:{ (st1([] s:(),x 0;n:(),x 1))`v}
/Another table
t:([] s:`a`b`b;v:4 6 8)
//user input
MAP:([ KEY:`U20`U33`M40 ] VAL:200 330 440 )
Is there a way to simplify the following one? Here I am creating a temporary column KEY for lj and then deleting it
delete KEY from (update KEY:first each f each (s,'v) from t) lj MAP

Can do it by one line by and avoid the f by forming tables to index into other tables. This should vectorize it and make it faster by avoiding the each both join and each(s) in your final line
q)update VAL:MAP[([]KEY:st1[([]s;n:v);`v]);`VAL] from t
s v VAL
-------
a 4 200
b 6 330
b 8
q)

Why not sychronise your column names so that you can do direct key-table lookups?
q)st:([] s:`a`a`a`b`b`b;v:3 5 7 3 5 7; KEY:`U20`U30`U50`U22`U33`U44)
q)st1:`s#select first KEY by s,v from st
q)t,'MAP st1 t
s v VAL
-------
a 4 200
b 6 330
b 8

If you can convert the user input (currently it's keyed table) to a dictionary
MAP2:`U20`U33`M40!200 330 440
The new query with MAP2:
update VAL:MAP2#f each (s,'v) from t
Actually, the following is bit simpler and faster:
update VAL:MAP2#f ( s;v) from t

Related

Processing each row in kdb table and appending arbitrary results in a new table

I have a table
t:([]a:`a`b`c;b:1 2 3;c:`x`y`z)
I would like to iterate and process each row.
The thing is that the processing logic for each row may result in arbitrary lines of data, after the full iteration the result maybe as such e.g.
results:([]a:`a1`b1`b2`b3`c1`c2;x:1 2 2 2 3 3)
I have the following idea so far but doesn't seem to work:
uj { // some processing function } each t
But how does one return arbitrary number of data append the results into a new table?
Assuming you are using something from the table entries to indicate your arbitrary value, you can use a dictionary to indicate a number (or a function) which can be used to apply these values.
In this example, I use the c column of the original table to indicate the number of rows to return (and the number from 1 to count to).
As each entry of the table is a dictionary, I can index using the column names to get the values and build a new table.
I also use raze to join each of the results together, as they will each have the same schema.
raze {[x]
d:`x`y`z!1 3 2;
([]a:((),`$string[x[`a]],/:string 1+til d[x[`c]]);x:((),d[x[`c]])#x[`b])
} each t
Not sure if this is what you want, but you can try something like this:
ungroup select a:`${y,/:x}[string b]'[string a],b from t
Or you can use accumulators if you need the result of the previous row calculations like this:
{y[`b]+:last[x]`b;x,y}/[t;t]
If your processing function is outputting tables that conform, just raze should suffice:
raze {y#enlist x}'[t;1 3 2]
a b c
-----
a 1 x
b 2 y
b 2 y
b 2 y
c 3 z
c 3 z
Otherwise use (uj/)
(uj/) {y#enlist x}'[t;1 3 2]
a b c
-----
a 1 x
b 2 y
b 2 y
b 2 y
c 3 z
c 3 z
Your best answer will depend very much on how you want to use the results computed from each row of t. It might suit you to normalise t; it might not. The key point here:
A table cell can be any q data structure.
The minimum you can do in this regard is to store the result of your processing function in a new column.
Below, an arbitrary binary function f returns its result as a dictionary.
q)f:{n:1+rand 3;(`$string[x],/:"123" til n)!n#y}
q)f [`a;2]
a1| 2
a2| 2
q)update d:a f'b from t
a b c d
---------------------
a 1 x `a1`a2`a3!1 1 1
b 2 y (,`b1)!,2
c 3 z `c1`c2!3 3
But its result could be any q data structure.
You were considering a unary processing function:
q)pf:{#[x;`d;:;] f . x`a`b}
q)pf each t
a b c d
---------------------
a 1 x `a1`a2`a3!1 1 1
b 2 y `b1`b2!2 2
c 3 z `c1`c2`c3!3 3 3
You might find other suggestions at KX Community.
If I understand correctly your question you need something like this :
(uj/){}each t
Check this bit :
(uj/)enlist[t],{x:update x:i from?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]}each t
This part :
x:update x:i from
// functional form of a function that takes random rows/columns
?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];
// some for of if-else and an update to generate column a (not bullet proof)
{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]
Basically the above gives something like :
q){x:update x:i from?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]}each t
+`a`b`c`x!(`a0`a1`a2`a3`a4`a5`a6`a7;1 1 1 1 1 1 1 1;`x`x`x`x`x`x`x`x;0 1 2 3 ..
+`a`x!(`a0`a1`a2`a3`a4`a5;0 1 2 3 4 5)
+`a`b`c`x!(`a0`a1`a2;1 1 1;`x`x`x;0 1 2)
+`a`b`c`x!(`a0`a1`a2`a3`a4`a5`a6`a7`a8`a9`a10`a11;1 1 1 1 1 1 1 1 1 1 1 1;`x`..
or taking the first one :
q)first{x:update x:i from?[rand[20]#enlist x;();0b;{x!x}rand[4]#cols[x]];{(x;![x;();0b;(enlist`a)!enlist($;enlist`;((';{raze string(x;y)});`a;`i))])[y~`a]}/[x;cols x]}each t
a b x
--------
a0 1 0
a1 1 1
a2 1 2
a3 1 3
a4 1 4
a5 1 5
a6 1 6
a7 1 7
a8 1 8
a9 1 9
a10 1 10
You can do
(uj/)enist[t],{ // some function }each t
to get what you want. Drop the enlist[t] if you don't want the table you start with in your result
Hope this helps.

How do I convert a dictionary of dictionaries into a table?

I've got a dictionary of dictionaries:
`1`2!((`a`b`c!(1 2 3));(`a`b`c!(4 5 6)))
| a b c
-| -----
1| 1 2 3
2| 4 5 6
I'm trying to work out how to turn it into a table that looks like:
1 a 1
1 b 2
1 c 3
2 a 4
2 b 5
2 c 6
What's the easiest/'right' way to achieve this in KDB?
Not sure if this is the shortest or best way, but my solution is:
ungroup flip`c1`c2`c3!
{(key x;value key each x;value value each x)}
`1`2!((`a`b`c!(1 2 3));(`a`b`c!(4 5 6)))
Which gives expected table with column names c1, c2, c3
What you're essentially trying to do is to "unpivot" - see the official pivot page here: https://code.kx.com/q/kb/pivoting-tables/
Unfortunately that page doesn't give a function for unpivoting as it isn't trivial and it's hard to have a general solution for it, but if you search the Kx/K4/community archives for "unpivot" you'll find some examples of unpivot functions, for example this one from Aaron Davies:
unpiv:{[t;k;p;v;f] ?[raze?[t;();0b;{x!x}k],'/:(f C){![z;();0b;x!enlist each (),y]}[p]'v xcol't{?[x;();0b;y!y,:()]}/:C:(cols t)except k;enlist(not;(.q.each;.q.all;(null;v)));0b;()]};
Using this, your problem (after a little tweak to the input) becomes:
q)t:([]k:`1`2)!((`a`b`c!(1 2 3));(`a`b`c!(4 5 6)));
q)`k xasc unpiv[t;1#`k;1#`p;`v;::]
k v p
-----
1 1 a
1 2 b
1 3 c
2 4 a
2 5 b
2 6 c
This solution is probably more complicated than it needs to be for your use case as it tries to solve for the general case of unpivoting.
Just an update to this, I solved this problem a different way to the selected answer.
In the end, I:
Converted each row into a table with one row in it and all the columns I needed.
Joined all the tables together.

function return a table in kdb/q

I'm new in kdb/q and I'm not familiar with kdb\q function, hope someone can help me. Here is the question:
I have a simple q function declared as the following:
func:{[x;y] x+y}
And {[x;y] x+y}[3;4] gives me the answer 7. Everything works perfectly.
If I have a table t with two columns such as:
_x _y
--------
3 4
2 5
6 2
...
Could I have a function in q such that compute x+y for each row of tablet?
And my expected return would be something like:
res
---
7
7
8
...
Thanks so much!
You can just pass the column names as parameters to the function:
q)tab:([]x:1 2 3;y:4 5 6)
q)func:{[x;y] x+y}
q)
q)select res:func[x;y]from tab
res
---
5
7
9
Alternatively you could use functional form to turn that query into a function:
q){?[x;();0b;enlist[`res]!enlist(`func;y;z)]}[tab;`x;`y]
res
---
5
7
9
Since + is overloaded to work with both atom and list , res:func[x;y] will work perfectly fine; however in cases when a dyadic function only accepts the arguments as atoms rather than lists then each-both will do the trick:
q)select res:func'[x;y] from tab // using each-both func'[x;y]
res
---
5
7
9
e.g To select as many charecters as c from column s
tab2:([] c:1 2 3;s:("123";"1234";"123456"))
q)update res:#'[c;s] from tab2 //func'[x;y]
c s res
-----------------------
1 "123" enlist "1"
2 "1234" "12"
3 "123456" "123"

Create new binary column based off of join in spark

My situation is I have two spark data frames, dfPopulation and dfSubpopulation.
dfSubpopulation is just that, a subpopulation of dfPopulation.
I would like a clean way to create a new column in dfPopulation that is binary of whether the dfSubpopulation key was in the dfPopulation key. E.g. what I want is to create the new DataFrame dfPopulationNew:
dfPopulation = X Y key
1 2 A
2 2 A
3 2 B
4 2 C
5 3 C
dfSubpopulation = X Y key
1 2 A
3 2 B
4 2 C
dfPopulationNew = X Y key inSubpopulation
1 2 A 1
2 2 A 0
3 2 B 1
4 2 C 1
5 3 C 0
I know this could be down fairly simply with a SQL statement, however given that a lot of Sparks optimization is now using the DataFrame construct, I would like to utilize that.
Using SparkSQL compared to DataFrame operations should make no difference from a performance perspective, the execution plan is the same. That said, here is one way to do it using a join
val dfPopulationNew = dfPopulation.join(
dfSubpopulation.withColumn("inSubpopulation", lit(1)),
Seq("X", "Y", "key"),
"left_outer")
.na.fill(0, Seq("inSubpopulation"))

Joining multiple times in kdb

I have two tables
table 1 (orders) columns: (date,symbol,qty)
table 2 (marketData) columns: (date,symbol,close price)
I want to add the close for T+0 to T+5 to table 1.
{[nday]
value "temp0::update date",string[nday],":mdDates[DateInd+",string[nday],"] from orders";
value "temp::temp0 lj 2! select date",string[nday],":date,sym,close",string[nday],":close from marketData";
table1::temp
} each (1+til 5)
I'm sure there is a better way to do this, but I get a 'loop error when I try to run this function. Any suggestions?
See here for common errors. Your loop error is because you're setting views with value, not globals. Inside a function value evaluates as if it's outside the function so you don't need the ::.
That said there's lots of room for improvement, here's a few pointers.
You don't need the value at all in your case. E.g. this line:
First line can be reduced to (I'm assuming mdDates is some kind of function you're just dropping in to work out the date from an integer, and DateInd some kind of global):
{[nday]
temp0:update date:mdDates[nday;DateInd] from orders;
....
} each (1+til 5)
In this bit it just looks like you're trying to append something to the column name:
select date",string[nday],":date
Remember that tables are flipped dictionaries... you can mess with their column names via the keys, as illustrated (very noddily) below:
q)t:flip `a`b!(1 2; 3 4)
q)t
a b
---
1 3
2 4
q)flip ((`$"a","1"),`b)!(t`a;t`b)
a1 b
----
1 3
2 4
You can also use functional select, which is much neater IMO:
q)?[t;();0b;((`$"a","1"),`b)!(`a`b)]
a1 b
----
1 3
2 4
Seems like you wanted to have p0 to p5 columns with prices corresponding to date+0 to date+5 dates.
Using adverb over to iterate over 0 to 5 days :
q)orders:([] date:(2018.01.01+til 5); sym:5?`A`G; qty:5?10)
q)data:([] date:20#(2018.01.01+til 10); sym:raze 10#'`A`G; price:20?10+10.)
q)delete d from {c:`$"p",string[y]; (update d:date+y from x) lj 2!(`d`sym,c )xcol 0!data}/[ orders;0 1 2 3 4]
date sym qty p0 p1 p2 p3 p4
---------------------------------------------------------------
2018.01.01 A 0 10.08094 6.027448 6.045174 18.11676 1.919615
2018.01.02 G 3 13.1917 8.515314 19.018 19.18736 6.64622
2018.01.03 A 2 6.045174 18.11676 1.919615 14.27323 2.255483
2018.01.04 A 7 18.11676 1.919615 14.27323 2.255483 2.352626
2018.01.05 G 0 19.18736 6.64622 11.16619 2.437314 4.698096