KDB: Convert a dictionary of tables into a table? - kdb

As per question, I have a dictionary of tables. How do I join the values into a single table?

raze works if the schemas of the tables all conform (aka all columns are the same and in the same order). If they don't conform, a more general option is to union join over:
/tables conform
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col1:`z`w;col2:3 4))
col1 col2
---------
x 1
y 2
z 3
w 4
/column order different
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1!(3;`z)
`col2`col1!(4;`w)
/non-matching columns
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1`col3!(3;`z;0b)
`col2`col1`col3!(4;`w;1b)
/uj handles any non-conformity
q)(uj/)`a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
col1 col2 col3
--------------
x 1 0
y 2 0
z 3 0
w 4 1

Use:
raze x
Raze is defined as:
Return the items of x joined, collapsing one level of nesting.
The table will not include the key, but if the key is also in each table then no information is lost.

It is easy to see what raze does:
parse "raze d"
,/
`d
As a matter of fact, personally in the past I have used the following command to achieve the same output:
(),/ d

Related

Compare a column with every element of another column which is an array

I have 3 columns in a postgresql table like:
col A col B col C
---------- --------- ---------
2020-01-01 2024-01-01 {2020-01-01, 2020-05-01, 2022-03-01}
2020-05-01 2021-05-01 {2020-01-01, 2020-05-01, 2022-03-01}
2022-03-01 2023-03-01 {2020-01-01, 2020-05-01, 2022-03-01}
col C is basically the array_agg of colA over the window. What I need to check is, for each row of col A, if the datetime is >= any of the elements of the array from col C. What is the possible solution?
Note: In my actual case there's another col D, which is the array_agg of col B. So what I'll be actually checking is col A >= any of the elements of the array from col C and col B <= any of the elements of the array from col D. I mainly don't know how to compare a value with each element from an array.
The syntax here is pretty nice, you can just write
WHERE A >= Any( C )
If you need to do more complicated checks on the elements in an array, you can also use a generator expression to make it act like multiple rows and then write SQL against it. For example,
WHERE 0 < (SELECT COUNT(*) FROM unnest(C) AS elt WHERE A >= elt)
Would be a more elaborate (but more general) way to do the same thing.

How to query table and sum up certain columns by criteria, but not others?

From a starting table, let's say:
A
B
C
1
1
99
2
2
88
3
3
77
I'm trying to write a query that would result in a table with a different value in column C based on the criteria that when A has value 2, the value for C should be the existing value + the value from C where A is 1. Here's the result:
A
B
C
1
1
99
2
2
187
3
3
77
Unsure if a grouping makes sense here, especially since there might be multiple similar criteria. The closes query I could think of would be
SELECT A, B, C+(SELECT C FROM table1 WHERE A=1 LIMIT 1) FROM table1 WHERE A=2;
but this isn't valid SQL, since subqueries can't be used like this. Any suggestions are welcome, even if they involve somehow altering the structure of the original table.
consider below approach (tested in BigQuery)
select a, b, c +
case a
when 2 then sum(if(a = 1, c, 0)) over()
else 0
end c
from your_table
if applied to sample data in your question - output is
SELECT
A,
B,
CASE
WHEN A=2 THEN C + (SELECT C FROM table WHERE A = 1)
ELSE C
END AS C
FROM
table;

PSQL filter each group of rows

Recently I've faced with pretty rare filtering case in PSQL.
My question is: How to filter redundant elements in each group of the grouped table?
For example: we have a nexp table:
id | group_idx | filter_idx
1 1 x
2 3 z
3 3 x
4 2 x
5 1 x
6 3 x
7 2 x
8 1 z
9 2 z
Firstly, to group rows:
SELECT group_idx FROM table
GROUP BY group_idx;
But how I can filter redundant fields (filter_idx = z) from each group after grouping?
P.S. I can't just write like that because I need to find groups firstly.
SELECT group_idx FROM table
where filter_idx <> z;
Thanks.
Assuming that you want to see all groups at all times, even when you filter out all records of some group:
drop table if exists test cascade;
create table test (id integer, group_idx integer, filter_idx character);
insert into test
(id,group_idx,filter_idx)
values
(1,1,'x'),
(2,3,'z'),
(3,3,'x'),
(4,2,'x'),
(5,1,'x'),
(6,3,'x'),
(7,2,'x'),
(8,1,'z'),
(9,2,'z'),
(0,4,'y');--added an example of a group that would be discarded using WHERE.
Get groups in one query, filter your rows in another, then left join the two.
select groups.group_idx,
string_agg(filtered_rows.filter_idx,',')
from
(select distinct group_idx from test) groups
left join
(select group_idx,filter_idx from test where filter_idx<>'y') filtered_rows
using (group_idx)
group by 1;
-- group_idx | string_agg
-------------+------------
-- 3 | z,x,x
-- 4 |
-- 2 | x,x,z
-- 1 | x,x,z
--(4 rows)

reshaping table based on column values

I was looking at a problem of reshaping a table creating new columns according based on values.
I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns
so I have a table:
df:([]ID:1+til 5;
Group:1 1 2 2 2;
V1:10 + 2 * til 5;
Type_v1:`t1`t2`t1`t1`t2;
V2:3 0N 0N 7 8;
Type_v2:`t2```t3`t3);
ID Group V1 Type_v1 V2 Type_v2
------------------------------
1 1 10 t1 3 t2
2 1 12 t2
3 2 14 t1
4 2 16 t1 7 t3
5 2 18 t2 8 t3
and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3
I did the beginning but I am unable to transform the table and create the new columns.
also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.
df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;
basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
any help and explanations appreciated,
The last piece you're missing is a pivot: https://code.kx.com/q/kb/pivoting-tables/
q)P:exec distinct Type from tr
q)exec P#(Type!Value) by Group:Group from tr
Group| t1 t2 t3
-----| --------
1 | 10 15
2 | 30 18 15
It doesn't quite get you the exact output but pivot is the concept
You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:
https://code.kx.com/q/basics/funsql/
// Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
{[df;n]
// dynamically create cols from 1 to n
cls:`$("v_";"typCol_"),\:/:string 1 + til n;
// functional form of select for each type/value col before joining together
df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
// sum, then pivot
df:0!select sum v by grpCol, typCol from df where typCol <> `;
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
// Type cols seem unnecessary but
// Can be done with another functional select
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
}[df;2]
grpCol t1 typCol_1 t2 typCol_2 t3 typCol_3
1 10 t1 15 t2 0N t3
2 30 t1 18 t2 15 t3
EDIT - More detailed breakdown below:
cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.
,\:/: -> join with each left and each right iterators
https://code.kx.com/q/ref/maps/#each-left-and-each-right
This allows me to join every item on the left ("v_";"typCol_") with every item on the right.
The same could be achieved with cross but you would have to restructure the list with flip and cut
flip n cut `$("v_";"typCol_") cross string 1 + til n
(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
(,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.
https://code.kx.com/q/ref/over/
{?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
// functional select
?[table; where; by; columns]
?[x; (); 0b; `grpCol`v`typCol!`grpCol,y]
This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.
The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values
e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).
There is a useful keyword to help with figuring out functional form -> parse
parse"select Group, Value:V1, Type:Type_v1 from df"
0 ?
1 `df
2 ()
3 0b
4 (`Group`Value`Type)!`Group`V1`Type_v1
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/
It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol
?[table; where; by; columns]
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:
(`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)
It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,
`t1 = (column) `t1
`typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column

Delete column based on type in KDB+

I have a table with a bunch of columns of various types. I need to delete all columns of a particular type, but I can't figure out how to do this.
I would like something like this:
delete from quotes where type = 11
But this doesn't work. Is there a way to do this? I was also able to list the relevant columns with the command
select c from meta quotes where type="s"
But this gives me a one column table with the column headings and I don't know where to go from there.
Could use a functional delete (!), or a take (#) or a drop (_)
q)t:([] col1:`a`b`c;col2:1 2 3;col3:`x`y`z;col4:"foo")
q)![t;();0b;exec c from meta[t] where t="s"]
col2 col4
---------
1 f
2 o
3 o
q)(exec c from meta[t] where t<>"s")#t
col2 col4
---------
1 f
2 o
3 o
q)(exec c from meta[t] where t="s") _ t
col2 col4
---------
1 f
2 o
3 o