Replace first n entries in a column in kdb - kdb

How can I replace the values in the first n columns of my table?
i.e. mycol:(1 2 3 4) to mycol:(a a 3 4)
Thank you in advance!

If it's the values within mycol that you want updated then they will need to be of the same type as the existing values. See below.
q)t:([]mycol:`$string 1+til 4;mycol2:til 4)
q)update mycol:`a from t where i<2
mycol mycol2
------------
a 0
a 1
3 2
4 3
One way around this though is to enlist mycol, that way updates of any type can be made.
q)t:([]mycol:1+til 4;mycol2:til 4)
q)update mycol:`a from(update enlist each mycol from t)where i<2
mycol mycol2
------------
`a 0
`a 1
,3 2
,4 3
q)meta update mycol:`a from(update enlist each mycol from t)where i<2
c | t f a
------| -----
mycol |
mycol2| j
It's unclear from your question whether you want the column names or the column values changed. If it's the column names, you can use xcol.
q)(2#`a)xcol([]w:3#til 3;x:3#.Q.a;y:`;z:0N)
a a y z
-------
0 a
1 b
2 c

Related

Storing output of a iterating function into unique tables KDB+/Q

I have a query function of the format:
function:{[x] select Y from table1 where date=x}
I want to iterate this function across all days stored in a separate table, like so
function each (select distinct date from table2)
and store every iteration into it's own, unique table. For example, if the first iteration was for 2021.11.10, all 'Y' values from table1 on that date are stored in a table named 'a', for the next iteration for date 2021.11.12, it goes to a table named 'b', or something like that. How can I do this?
Assume the following tables:
q)t1:([]date:.z.d + 0 0 0 1 1 1;a:1 2 3 4 5 6)
q)t2:([]date:.z.d+0 1)
then you could do something like:
q)function:{[x;y] x set select from t1 where date=y}
q)function'[`a`b;exec distinct date from t2]
`a`b
q)show a
date a
------------
2021.11.10 1
2021.11.10 2
2021.11.10 3
q)show b
date a
------------
2021.11.11 4
2021.11.11 5
2021.11.11 6
If you wanted to make it a bit more dynamic then you could also do something like:
q)function:{[x] (`$"tab",string[x] except ".") set select from t1 where date=x}
q)
q)
q)function each exec distinct date from t2
`tab20211110`tab20211111
q)show tab20211110
date a
------------
2021.11.10 1
2021.11.10 2
2021.11.10 3
q)show tab20211111
date a
------------
2021.11.11 4
2021.11.11 5
2021.11.11 6
The above creates new tables named for each date (we remove the "." from the resulting names as they could be confused for q's . operator)

select all columns with suffix _test in q kdb

I have a partitioned table, similar to below table:
q)t:([]date:3#2019.01.01; a:1 2 3; a_test:2 3 4; b_test:3 4 5; c: 6 7 8);
date a a_test b_test c
----------------------------
2019.01.01 1 2 3 6
2019.01.01 2 3 4 7
2019.01.01 3 4 5 8
Now, I want to fetch date column and all columns have names with suffix "_test" from table t.
Expected output:
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In my original table, there are more than 100 columns with name having _test so below is not a practical solution in this case.
q)select date, a_test, b_test from t where date=2019.01.01
I tried various options like below, but of no use:
q)delete all except date, *_test from select from t where date=2019.01.01
If the columns you are selecting are variable then you should use a functional qSQL statement to perform the query. The following can be used in your case
q)query:{[tab;dt;c]?[tab;enlist (=;`date;dt);0b;(`date,c)!`date,c]}
q)query[t;2019.01.01;cols[t] where cols[t] like "*_*"]
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In order to craft a particular functional statement, you can parse your query, putting dummy columns in place if you aren't sure what they should be
q)parse "select date,c1,c2 from tab where date=dt"
?
`tab
,,(=;`date;`dt)
0b
`date`c1`c2!`date`c1`c2
A functional select is probably the best way to go here if you require adding further filters.
?[`t;();0b;{x!x}`date,exec c from meta t where c like "*_test"]
The functional form of any select quesry can be obtained by using the -5! operator on any SQL style statement.
In the example below I have created a table with 20 fields, each one beginning with either a or b.
I then use the functional form to define which fields I want.
q)tab:{[x] enlist x!count[x]#0}`$"_" sv ' raze string `a`b,/:\:til 10
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"b*"]
b_0 b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"a*"]
a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q)-5!" select a,b from c"
?
`c
()
0b
`a`b!`a`b
Alternatively, if I don't require any filtering I can use the # operator as in below:
{[x;s] (cols[x] where cols[x] like s)#x}[ tab;"a*"]

Functional update - multivariable function with dynamic columns

Any help with the following would be much appreciated!
I have two tables: table1 is a summary table whilst table2 is a list of all data points. I want to be able to summarise the information in table2 for each row in table1.
table1:flip `grp`constraint!(`a`b`c`d; 10 10 20 20);
table2:flip `grp`cat`constraint`val!(`a`a`a`a`a`b`b`b;`cl1`cl1`cl1`cl2`cl2`cl2`cl2`cl1; 10 10 10 10 10 10 20 10; 1 2 3 4 5 6 7 8);
function:{[grpL;constraintL;catL] first exec total: sum val from table2 where constraint=constraintL, grp=grpL,cat=catL};
update cl1:function'[grp;constraint;`cl1], cl2:function'[grp;constraint;`cl2] from table1;
The fourth line of this code achieves what I want for the two categories:cl1 and cl2
In table1 I want to name a new column with the name of the category (cl1, cl2, etc.) and I want the values in that column to be the output from running the function over that column.
However, I have hundreds of different categories, so don't want to have to list them out manually as in the fourth line. How would I pass in a list of categories, e.g. below?
`cl1`cl2`cl3
Sticking to your approach, you would just have to make your update statement functional and then iterate over the columns like so:
{![`table1;();0b;(1#x)!enlist ((';function);`grp;`constraint;1#x)]} each `cl1`cl2
Assuming you can amend table1 in place. If you must retain the original table1 then you can pass it by value though it will consume more memory
{![x;();0b;(1#y)!enlist ((';function);`grp;`constraint;1#y)]}/[table1;`cl1`cl2]
Another approach would be to aggregate, pivot and join though it's not necessarily a better solution as you get nulls rather than zeros
a:select sum val by cat,grp,constraint from table2
p:exec (exec distinct cat from a)#cat!val by grp,constraint from a
table1 lj p
There are several different methods you can look into.
The easiest method would be a functional update - http://code.kx.com/wiki/JB:QforMortals2/queries_q_sql#Functional_update
Below, though, should somewhat prove more useful, quicker and neater:
Your problem can be split into 2 parts. For the first part, you are looking to create a sum of each category by grp and constraint within table2. As for the second part, you are looking to join these results (the lookups) onto the corresponding records from table1.
You can create the necessary groups using by
q)exec val,cat by grp,constraint from table2
grp constraint| val cat
--------------| ------------------------------
a 10 | 1 2 3 4 5 `cl1`cl1`cl1`cl2`cl2
b 10 | 6 8 `cl2`cl1
b 20 | ,7 ,`cl2
Note though, this will only create nested lists of the columns in your select query
Next is to sum each of the cat groups
q)exec sum each val group cat by grp,constraint from table2
grp constraint|
--------------| ------------
a 10 | `cl1`cl2!6 9
b 10 | `cl2`cl1!6 8
b 20 | (,`cl2)!,7
Then, to create the cat's columns you can use a pivot like syntax - http://code.kx.com/wiki/Pivot
q)cats:asc exec distinct cat from table2
q)exec cats#sum each val group cat by grp,constraint from table2
grp constraint| cl1 cl2
--------------| -------
a 10 | 6 9
b 10 | 8 6
b 20 | 7
Now you can use this lookup table and index into each row from table1
q)(exec cats#sum each val group cat by grp,constraint from table2)[table1]
cl1 cl2
-------
6 9
8 6
To fill the nulls with zeros, use the carat symbol - http://code.kx.com/wiki/Reference/Caret
q)0^(exec cats#sum each val group cat by grp,constraint from table2)[table1]
cl1 cl2
-------
6 9
8 6
0 0
0 0
And now you can join on each row from table1 to your results using join-each
q)table1,'0^(exec cats#sum each val group cat by grp,constraint from table2)[table1]
grp constraint cl1 cl2
----------------------
a 10 6 9
b 10 8 6
c 20 0 0
d 20 0 0
HTH, Sean
This approach is the easiest way to pass in a list of categories
{table1^flip x!function'[table1`grp;table1`constraint;]each x}`cl1`cl2

kdb how to calculate rolling count

Assume I have a table of events, with Timestamp and Type.
t1, 'b'
t2, 'x'
t3, 's'
t4, 'b'
How can I get a rolling count such that it would give me a list of all timestamps and the cummulative number of events up to taht ts, sort of like a count version of sums
for example for 'b' I d like a table
't1', 1
't2', 1
't3', 1
't4', 2
Here is one way to do it, although there may be a more clever way this uses sums:
//table definition
tab:([]a:`t1`t2`t3`t4;b:"bxsb")
//rolling sum of 1 by column b
update sums count[i]#1 by b from tab
Results in:
a b x
------
t1 b 1
t2 x 1
t3 s 1
t4 b 2
If you wanted replace b you would simply put b: in front of the sums .
One way:
q)t:([]p:asc 4?.z.p+til 1000;t:`b`x`s`b)
q)asc `p xcols ungroup select p,til count i by t from t
p t x
---------------------------------
2017.05.16D09:42:48.259062090 b 0
2017.05.16D09:42:48.259062585 x 0
2017.05.16D09:42:48.259062683 s 0
2017.05.16D09:42:48.259062858 b 1
Ps: Note I have started the sequence at 0 as if to say "I've had 0 events prior to this row" instead of beginning at 1 as per your example. It goes with your req "number of events up to that ts". If you need 1, just add 1 '1+til count i'. Also ensure your time is sorted so as it makes sense when beginning the sequence.
With table t as below:
q)show t: ([]ts:.z.t - desc "u"$(til 4);symb:`b`x`z`b)
ts symb
-----------------
09:46:56.384 b
09:47:56.384 x
09:48:56.384 z
09:49:56.384 b
using a vector conditional:
q)select ts, cum_count:sums ?[symb=`b;1;0] from t
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
The same, but with a function taking symb as a parameter:
q){select ts, cum_count:sums ?[symb=x;1;0] from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
In fact you don't need a vector conditional because you can just sum the booleans directly:
q){select ts, cum_count:sums symb=x from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
This also works
update x:1+til count i by b from tab

kdb+: function with two arguments from columns

I have a function that does something with a date and a function that takes two arguments to perform a calculation. For now let's assume that they look as follows:
d:{[x] :x.hh}
f:{[x;y] :x+y}
Now I want to use function f in a query as follows:
select f each (columnOne,d[columnTwo]) from myTable
Hence, I first want to convert one column to the corresponding numbers using function d. Then, using both columnOne and the output of d[columnTwo], I want to calculate the outcome of f.
Clearly, the approach above does not work, as it fails with a 'rank error.
I've also tried select f ./: (columnOne,'d[columnTwo]) from myTable, which also doesn't work.
How do I do this? Note that I need to input columnOne and columnTwo into f such that the corresponding rows still match. E.g. input row 1 of columnOne and row 1 of columnTwo simultaneously into f.
I've also tried select f ./: (columnOne,'d[columnTwo]) from myTable, which also doesn't work.
You're very close with that code. The issue is the d function, in particular the x.hh within function d - the .hh notation doesn't work in this context, and you will need to do `hh$x instead, so d becomes:
d:{[x] :`hh$x}
So making only this change to the above code, we get:
q)d:{[x] :`hh$x}
q)f:{[x;y] :x+y}
q)myTable:([] columnOne:10?5; columnTwo:10?.z.t);
q)update res:f ./: (columnOne,'d[columnTwo]) from myTable
columnOne columnTwo res
--------------------------
1 21:10:45.900 22
0 20:23:25.800 20
2 19:03:52.074 21
4 00:29:38.945 4
1 04:30:47.898 5
2 04:07:38.923 6
0 06:22:45.093 6
1 19:06:46.591 20
1 10:07:47.382 11
2 00:45:40.134 2
(I've changed select to update so you can see other columns in result table)
Other syntax to achieve the same:
q)update res:f'[columnOne;d columnTwo] from myTable
columnOne columnTwo res
--------------------------
1 21:10:45.900 22
0 20:23:25.800 20
2 19:03:52.074 21
4 00:29:38.945 4
1 04:30:47.898 5
2 04:07:38.923 6
0 06:22:45.093 6
1 19:06:46.591 20
1 10:07:47.382 11
2 00:45:40.134 2
Only other note worthy point - in the above example, function d is vectorised (works with vector arg), if this wasn't the case, you'd need to change d[columnTwo] to d each columnTwo (or d'[columnTwo])
This would then result in one of the following queries:
select res:f'[columnOne;d'[columnTwo]] from myTable
select res:f ./: (columnOne,'d each columnTwo) from myTable
select res:f ./: (columnOne,'d'[columnTwo]) from myTable