Reshape [cols;table] - kdb

How do I get columns from a table? If they don't exist it's ok to get them as null columns.
Trying reshape#:
q)d:`a`b!1 2
q)enlist d
a b
---
1 2
q)`a`c#d
a| 1
c|
q)`a`c#enlist d
'c
[0] `a`c#enlist d
^
Why does thereshape# operator not work on a table? It could easily act on each row (which is dict) and combine results. So I'm forced to write:
q)`a`c#/:enlist d
a c
---
1
Is it the shortest way?

Any key you try to take (#) which is not present in a dictionary will be assigned a null value of the same type as the first value in the dictionary. Similar behaviour is not available for tables.
q)`a`c#`a`b!(1 2;())
a| 1 2
c| `long$()
q)`b`c#`a`b!(();1 2)
b| 1 2
c| ()
Like you mentioned, the use of each-right (/:) will act on each row of the table IE each dictionary. Instead of using an iterator to split the table into dictionaries we can act on the dictionary itself. This will return the same output and is slightly faster.
q)d:`a`b!1 2
q)enlist`a`c#d
a c
---
1
q)(`a`c#/:enlist d)~enlist`a`c#d
1b
q)\ts:1000000 enlist`a`c#d
395 864
q)\ts:1000000 `a`c#/:enlist d
796 880

Related

How do I parameterise the column list in KDB?

I have a number of repetitive queries:
select lR, e10, e25, vr from z
Is there a way I can do something like:
features: `lR`e10`e25`vr
select features from z
You could use # like so:
`lR`e10`e25`vr#z
NB: The left argument here must be a list so to select a single column use the following:
enlist[`vr]#z
Example:
q)t:([]a:`a`b`c;b:til 3;c:0b);
q)`a`b#t
a b
---
a 0
b 1
c 2
Another approach is to use a functional form (which you can build using parse):
q)0N!parse"select lR, e10, e25, vr from z";
(?;`z;();0b;`lR`e10`e25`vr!`lR`e10`e25`vr)
q)features:`lR`e10`e25`vr
q)?[z;();0b;features!features]
If you use # for this then be aware it will fail on a keyed table.
One possible way of modifying it to work on any table would be something like:
f:{[t;c] if[not .Q.qt[t]; '"Input is not a table"]; c,:(); $[99h = type[t];c#/:t;c#t]}
So make sure your table is, in fact, a table, make sure columns are a list, and then perform the required # operation.
q)t
a| b c d
-| ------
a| 4 7 10
b| 5 8 11
c| 6 9 12
q)f[t;`b]
a| b
-| -
a| 4
b| 5
c| 6
q)f[0!t;`b]
b
-
4
5
6
q)f[flip 0!t;`b]
'Input is not a table
[0] f[flip 0!t;`b]
^

Restrict table columns, preserving keys

I've found in "Q Tips" a technique to preserve keys in a table. This is useful for restriction columns in the right table in lj for example, without re-applying a key. Using each:
q)show t:(`c1`c2!1 2;`c1`c2!3 4)!(`c3`c4`c5!30 40 50;`c3`c4`c5!31 41 51)
c1 c2| c3 c4 c5
-----| --------
1 2 | 30 40 50
3 4 | 31 41 51
q)`c3`c4#/:t
c1 c2| c3 c4
-----| -----
1 2 | 30 40
3 4 | 31 41
I’m trying to understand why it preserves a key part of the table t:
q){-3!x}/:t
'/:
[0] {-3!x}/:t
^
But in this case q doesn’t show how it treats each row of the keyed table.
So why is this syntax #/:t works in such a way for a keyed table? Is it mentioned anywhere in code.kx.com docs?
Upd1: I've found a case with # and keyed table on code.kx.com, but it is about selecting rows, not columns.
If you view the keyed table as a dictionary (which it is) then it's no different to:
q)2*/:`a`b!1 2
a| 2
b| 4
or
q){x+1} each `a`b!1 2
a| 2
b| 3
The keys are retained when applying a function to each element of a dictionary. In your example the function being applied is to use take on a dictionary, e.g:
q)`c3`c4#first t
c3| 30
c4| 40
doing that for each row returns a list of dictionaries which is itself a table.
Also your other attempt would work as:
{-3!x}#/:t
so it's not unique to take #
{-3!x}/:t
each right needs two arguments so this wont work.
Since the table is keyed, it is treated as a dictionary. The each right iterates over the dictionary values and therefore ignores the keys of the main dictionary (= the keyed columns). To see what is happening it might help to see what happens when using each:
q)){-3!x} each t
c1 c2|
-----| --------------------
1 2 | "`c3`c4`c5!30 40 50"
3 4 | "`c3`c4`c5!31 41 51"

Different results of flip on select and on index-accessed table in kdb+

In a q session I've made a keyed table t:
q)/KDB+ 3.6 2018.05.17
q)f:flip (`a`b)!(1 2 3;4 5 6)
q)k:flip (enlist `k)!(enlist 101 102 103)
q)t:k!f;t
k | a b
---| ---
101| 1 4
102| 2 5
103| 3 6
Then I've tried to make a query and it gives a nice results:
q)select a,b from t where k=101
a b
---
1 4
q)flip select a,b from t where k=101
a| 1
b| 4
q)flip flip select a,b from t where k=101
a b
---
1 4
But without select-syntax this gives an error:
q)t[101]
a| 1
b| 4
q)flip t[101]
'rank
[0] flip t[101]
^
Why can't I just make a simple flip on the same result as from select of the same data types?
q)type flip select a,b from t where k=101
99h
q)type t[101]
99h
Because the elements of dictionary t[101] aren't lists, but atoms. So flip on a list of atoms fails.
Appending each element to an empty list first will work.
q)(),/:t[101]
a| 1
b| 4
Not necessarily something you want to do. For a given dictionary output, the solution you probably want is enlist
q)enlist t[101]
a b
---
1 4
An alternative approach would be to lookup using a table rather than a lookup using an atom:
q)t[([]k:(),101)]
a b
---
1 4
That would be the equivalent of select a,b from t where k=101

Parameterize select query in unary kdb function

I'd like to be able to select rows in batches from a very large keyed table being stored remotely on disk. As a toy example to test my function I set up the following tables t and nt...
t:([sym:110?`A`aa`Abc`B`bb`Bac];px:110?10f;id:1+til 110)
nt:0#t
I select from the table only records that begin with the character "A", count the number of characters, divide the count by the number of rows I would like to fetch for each function call (10), and round that up to the nearest whole number...
aRec:select from t where sym like "A*"
counter:count aRec
divy:counter%10
divyUP:ceiling divy
Next I set an idx variable to 0 and write an if statement as the parameterized function. This checks if idx equals divyUP. If not, then it should select the first 10 rows of aRec, upsert those to the nt table, increment the function argument, x, by 10, and increment the idx variable by 1. Once the idx variable and divyUP are equal it should exit the function...
idx:0
batches:{[x]if[not idx=divyUP;batch::select[x 10]from aRec;`nt upsert batch;x+:10;idx+::1]}
However when I call the function it returns a type error...
q)batches 0
'type
[1] batches:{[x]if[not idx=divyUP;batch::select[x 10]from aRec;`nt upsert batch;x+:10;idx+::1]}
^
I've tried using it with sublist too, though I get the same result...
batches:{[x]if[not idx=divyUP;batch::x 10 sublist aRec;`nt upsert batch;x+:10;idx+::1]}
q)batches 0
'type
[1] batches:{[x]if[not idx=divyUP;batch::x 10 sublist aRec;`nt upsert batch;x+:10;idx+::1]}
^
However issuing either of those above commands outside of the function both return the expected results...
q)select[0 10] from aRec
sym| px id
---| ------------
A | 4.236121 1
A | 5.932252 3
Abc| 5.473628 5
A | 0.7014928 7
Abc| 3.503483 8
A | 8.254616 9
Abc| 4.328712 10
A | 5.435053 19
A | 1.014108 22
A | 1.492811 25
q)0 10 sublist aRec
sym| px id
---| ------------
A | 4.236121 1
A | 5.932252 3
Abc| 5.473628 5
A | 0.7014928 7
Abc| 3.503483 8
A | 8.254616 9
Abc| 4.328712 10
A | 5.435053 19
A | 1.014108 22
A | 1.492811 25
The issue is that in your example, select[] and sublist requires a list as an input but your input is not a list. Reason for that is when there is a variable in items(which will form a list), it is no longer considered as a simple list meaning blank(space) cannot be used to separate values. In this case, a semicolon is required.
q) x:2
q) (1;x) / (1 2)
Select command: Change input to (x;10) to make it work.
q) t:([]id:1 2 3; v: 3 4 5)
q) {select[(x;2)] from t} 1
`id `v
---------
2 4
3 5
Another alternative is to use 'i'(index) column:
q) {select from t where i within x + 0 2} 1
Sublist Command: Convert left input of the sublist function to a list (x;10).
q) {(x;2) sublist t}1
You can't use the select[] form with variable input like that, instead you can use a functional select shown in https://code.kx.com/q4m3/9_Queries_q-sql/#912-functional-forms where you input as the 5th argument the rows you want
Hope this helps!

How can I count the null entries by column in a kdb q table?

Given a table that contains a number of null entries how can I create a summary table that describes the number of nulls per column? Can this be done on a general table where the number of columns and column names are not known beforehand?
q)t: ([] a: 1 2 3 4; b: (2018.10.08; 0Nd; 2018.10.08; 2018.10.08); c: (0N;0N;30;40); d: `abc`def``jkl)
q)t
a b c d
-------------------
1 2018.10.08 abc
2 def
3 2018.10.08 30
4 2018.10.08 40 jkl
Expected result:
columnName nullCount
--------------------
a 0
b 1
c 2
d 1
While sum null t is the simplest solution in this example, it doesn't handle string (or nested) columns. To handle string or nested columns for example you would need something like
q)t: ([] a: 1 2 3 4; b: (2018.10.08; 0Nd; 2018.10.08; 2018.10.08); c: (0N;0N;30;40); d: `abc`def``jkl;e:("aa";"bb";"";()," "))
q){sum$[0h=type x;0=count#'x;null x]}each flip t
a| 0
b| 1
c| 2
d| 1
e| 1
You can make such a table using
q)flip `columnName`nullCount!(key;value)#\:sum null t
columnName nullCount
--------------------
a 0
b 1
c 2
d 1
where sum null t gives a dictionary of the null values in each column
q)sum null t
a| 0
b| 1
c| 2
d| 1
and we apply the column names as keys and flip to a table.
To produce a table with the columns as the headers and number of nulls and the values you can use:
q)tab:enlist sum null t
Which enlists a dictionary with the number of nulls as the values and the columns names as keys:
a b c d
-------
0 1 2 1
If you then wanted this in your given format you could then use:
result:([]columnNames:cols tab; nullCount:raze value each tab)