Getting table rows count by using count 1 - kdb

To get an appropriate table rows count I thought to use a naive approach: use count 1 construct. And it works in a simple case:
q)t:([]sym:`a`a`b`b);
q)select cnt: count 1 by sym from t
sym| cnt
---| ---
a | 2
b | 2
But when I added other fields, I've got wrong result:
q)select cnt: count 1, sym by sym from t
sym| cnt sym
---| -------
a | 1 a a
b | 1 b b
Why does count 1 work (or just it seems so) in one column case and failed with multiple columns?
Upd: Expected to get something like this
sym| cnt sym
---| -------
a | 2 a a
b | 2 b b

I don't think count 1 will produce the result you're looking for, nor even a consistent one.
I think you might want to use count i instead. When selecting by sym you are specifying which column you want to count by.
q)t:([]sym:`a`a`b`b)
q)select cnt:count i,sym by sym from t
sym| cnt sym
---| -------
a | 2 a a
b | 2 b b
q).z.K
3.6
A point to note however is that this solution will not work on kdb+ 4.0.
q)t:([]sym:`a`a`b`b)
q)select cnt:count i,sym by sym from t
'dup names for cols/groups sym
[0] select cnt:count i,sym by sym from t
^
q).z.K
4f

Related

kdb: filter table match symbol column: ~ vs =

For a where clause to filter on symbol columns, = works fine, but why does the match operator ~ not work?
q)t:([sym:`aa`bb]qty:20 30)
q)t
sym| qty
---| ---
aa | 20
bb | 30
q)select from t where sym=`aa
sym| qty
---| ---
aa | 20
q)select from t where sym~`aa
sym| qty
---| ---
Match is comparing `aa to the entire symbol column, where equals is comparing to each element
q)`a=`a`b`c
100b
q)`a~`a`b`c
0b
You could do
q)select from t where sym~\:`aa
sym| qty
---| ---
aa | 20

Creating nested columns in kdb table

I'd like to create a nested listed for one of my table's columns, but I'm unsure of the syntax to use. If for instance I had the following table...
q)t:([]submitter:`A`B`C; code:3?100; status:110b)
q)t
submitter code status
---------------------
A 2 1
B 39 1
C 64 0
I want to do something similar to below. However this will add the additional column x to the table and place the value there instead of creating a compound list for the code column....
q)update code,:77 from t where status<>1b
submitter code status x
------------------------
A 2 1
B 39 1
C 64 0 77
If it were a dictionary with a single value I would do the following...
q)d:`sumbitter`code`status!(`A;1?100;1)
q)d
sumbitter| `A
code | ,88
status | 1
q)d[`code],:99
q)d
sumbitter| `A
code | 88 99
status | 1
How do I perform the same operation on a table with multiple rows?
My desired output would look like...
q)t
submitter code status
----------------------
A 2 1
B 39 1
C 64 77 0
This would also do it for you, doesn't require you to change the type in advance
q)update code:(code,'(77;())status) from t
submitter code status
---------------------
A ,12 1
B ,10 1
C 1 77 0
You can't change the column type of your code column on-the-fly like you intend to do.
Instead, you first have to update the type of the column code to a list of long instead of long:
q)meta t
c | t f a
---------| -----
submitter| s
code | j
status | b
Update the type:
t: update enlist each code from t
Now the type of code is "J", which is indeed a list of long:
q)meta t
c | t f a
---------| -----
submitter| s
code | J
status | b
And then you can append an element to the code like this:
t:update code:{x,77} each code from t where status<>1b
q)t
submitter code status
----------------------
A ,2 1
B ,39 1
C 64 77 0

Parameterize select query in unary kdb function

I'd like to be able to select rows in batches from a very large keyed table being stored remotely on disk. As a toy example to test my function I set up the following tables t and nt...
t:([sym:110?`A`aa`Abc`B`bb`Bac];px:110?10f;id:1+til 110)
nt:0#t
I select from the table only records that begin with the character "A", count the number of characters, divide the count by the number of rows I would like to fetch for each function call (10), and round that up to the nearest whole number...
aRec:select from t where sym like "A*"
counter:count aRec
divy:counter%10
divyUP:ceiling divy
Next I set an idx variable to 0 and write an if statement as the parameterized function. This checks if idx equals divyUP. If not, then it should select the first 10 rows of aRec, upsert those to the nt table, increment the function argument, x, by 10, and increment the idx variable by 1. Once the idx variable and divyUP are equal it should exit the function...
idx:0
batches:{[x]if[not idx=divyUP;batch::select[x 10]from aRec;`nt upsert batch;x+:10;idx+::1]}
However when I call the function it returns a type error...
q)batches 0
'type
[1] batches:{[x]if[not idx=divyUP;batch::select[x 10]from aRec;`nt upsert batch;x+:10;idx+::1]}
^
I've tried using it with sublist too, though I get the same result...
batches:{[x]if[not idx=divyUP;batch::x 10 sublist aRec;`nt upsert batch;x+:10;idx+::1]}
q)batches 0
'type
[1] batches:{[x]if[not idx=divyUP;batch::x 10 sublist aRec;`nt upsert batch;x+:10;idx+::1]}
^
However issuing either of those above commands outside of the function both return the expected results...
q)select[0 10] from aRec
sym| px id
---| ------------
A | 4.236121 1
A | 5.932252 3
Abc| 5.473628 5
A | 0.7014928 7
Abc| 3.503483 8
A | 8.254616 9
Abc| 4.328712 10
A | 5.435053 19
A | 1.014108 22
A | 1.492811 25
q)0 10 sublist aRec
sym| px id
---| ------------
A | 4.236121 1
A | 5.932252 3
Abc| 5.473628 5
A | 0.7014928 7
Abc| 3.503483 8
A | 8.254616 9
Abc| 4.328712 10
A | 5.435053 19
A | 1.014108 22
A | 1.492811 25
The issue is that in your example, select[] and sublist requires a list as an input but your input is not a list. Reason for that is when there is a variable in items(which will form a list), it is no longer considered as a simple list meaning blank(space) cannot be used to separate values. In this case, a semicolon is required.
q) x:2
q) (1;x) / (1 2)
Select command: Change input to (x;10) to make it work.
q) t:([]id:1 2 3; v: 3 4 5)
q) {select[(x;2)] from t} 1
`id `v
---------
2 4
3 5
Another alternative is to use 'i'(index) column:
q) {select from t where i within x + 0 2} 1
Sublist Command: Convert left input of the sublist function to a list (x;10).
q) {(x;2) sublist t}1
You can't use the select[] form with variable input like that, instead you can use a functional select shown in https://code.kx.com/q4m3/9_Queries_q-sql/#912-functional-forms where you input as the 5th argument the rows you want
Hope this helps!

Kdb upsert with conditional syntax?

Is there a way I can upsert in kdb where the following occurs:
If key is not present, insert values
If key is present, check if current value is greater
A) If so, perform no action
B) If not, update values
Something like:
job upsert ([title: job1] time: enlist 1 where time > 1)
Since you're using a keyed table, and you want to change values only if they're greater and add in new keys and values, you can try avoiding upsert entirely:
t:([job:`a`b`c] val: 4 4 4) /current table
nt:([job:`a`c`d]val: 6 1 5) /new values to check
t|nt
job| val
---| ---
a | 6
b | 4
c | 4
d | 5
This will automatically add keys that aren't there, and update the current value to the new value if the new value is larger.
please find a solution and explanation below. I'll edit if I come up with a better way - thanks. *also I hope I interpreted the question correctly.
q)t1
name | age height
-------| ----------
michael| 26 173
john | 57 156
sam | 23 134
jimmy | 83 183
conor | 32 145
jim | 64 167
q)t2
name age height
---------------
john 98 220
mary 24 230
jim 50 240
q)t1 upsert t2 where{$[all null n:x[y`name];1b;y[`age]>n[`age]]}[t1;]each t2
name | age height
-------| ----------
michael| 26 173
john | 98 220
sam | 23 134
jimmy | 83 183
conor | 32 145
jim | 64 167
mary | 24 230
q)
Explanation;
The function takes 2 args, x = the keyed table t1 and y = each record from t2(as a dictionary). First we extract the name value from the t2 record(y`name) and try to index into the source keyed table with that value and store the result in the local variable n. If the name exists, the corresponding record(n, as a dictionary)will be returned from y(and all null n will be false) otherwise an empty record will be returned(and all null n will be true). If we cannot find an instance of the t2[`name] in t1 then we just return 1b from the function. Otherwise, then we want to compare the ages between the two records (n[`age] <-- age referenced in t1 for the matching name & y[`age] <-- age of this particular record of t2) - if the age for this matching record in t2 (y[`age]) is greater than the matching value from t1 then we return 1b otherwise we return 0b.
The result of this function is a list of booleans, one for each record in t2. 1b is returned under 2 scenarios - either;
(1) This particular name from t2 has no match in t1. (2) This name from t2 does have a match in t1 and the age is greater than the corresponding age in t1. 0b is returned when the age referenced in t2 is less than the corresponding age from t1.
In our example the result of the function is 110b and after we apply where to this, the result is the indexes where the list value is true i.e. where 110b --> 0 1. We use this list to index into t2 which returns the first 2 records from t2(these are either new records or records where the age is greater than what is referenced in t1), then we simply upsert this into t1.
I hope this helps and hope some better solutions come along.
For a table, a key, and a value: upsert the tuple if the key is new or the value exceeds the existing value.
q)t:([job:`a`b`c] val: 4 4 4) /current table
q)t[`a]|:6 /old key, higher value
q)t
job| val
---| ---
a | 6
b | 4
c | 4
q)t[`c]|:1 /old key, lower value
q)t
job| val
---| ---
a | 6
b | 4
c | 4
q)t[`d]|:5 /new key
q)t
job| val
---| ---
a | 6
b | 4
c | 4
d | 5
Remarks
A keyed table with a single data column could perhaps be a dictionary.
Amending through an operator works also with a new key.
Upserting a table (or dictionary) of new records is more efficient and simpler than updating a single tuple.
q)nt:([job:`a`c`d]val: 6 1 5) /new values to check
q)t|nt /maximum of two tables
job| val
---| ---
a | 6
b | 4
c | 4
d | 5
or just
q)t[([]job:`a`c`d)]|:([]val:6 1 5)
Simple-looking primitives such as maximum (|) repay careful study.

KDB: select first n rows from each group

How can I extract the first n rows from each group? For example: for table
bb: ([]sym:(4#`a),(5#`b);val: til 9)
sym val
-------------
a 0
a 1
a 2
a 3
b 4
b 5
b 6
b 7
b 8
How can I select the first 2 rows of each group by sym?
Thanks
Can use fby:
q)select from bb where ({x in 2#x};i) fby sym
sym val
-------
a 0
a 1
b 4
b 5
You can try this:
q)select from t where i in raze exec 2#i by sym from t
sym val
-------
a 0
a 1
b 4
b 5