Compare a column with every element of another column which is an array - postgresql

I have 3 columns in a postgresql table like:
col A col B col C
---------- --------- ---------
2020-01-01 2024-01-01 {2020-01-01, 2020-05-01, 2022-03-01}
2020-05-01 2021-05-01 {2020-01-01, 2020-05-01, 2022-03-01}
2022-03-01 2023-03-01 {2020-01-01, 2020-05-01, 2022-03-01}
col C is basically the array_agg of colA over the window. What I need to check is, for each row of col A, if the datetime is >= any of the elements of the array from col C. What is the possible solution?
Note: In my actual case there's another col D, which is the array_agg of col B. So what I'll be actually checking is col A >= any of the elements of the array from col C and col B <= any of the elements of the array from col D. I mainly don't know how to compare a value with each element from an array.

The syntax here is pretty nice, you can just write
WHERE A >= Any( C )
If you need to do more complicated checks on the elements in an array, you can also use a generator expression to make it act like multiple rows and then write SQL against it. For example,
WHERE 0 < (SELECT COUNT(*) FROM unnest(C) AS elt WHERE A >= elt)
Would be a more elaborate (but more general) way to do the same thing.

Related

KDB: Convert a dictionary of tables into a table?

As per question, I have a dictionary of tables. How do I join the values into a single table?
raze works if the schemas of the tables all conform (aka all columns are the same and in the same order). If they don't conform, a more general option is to union join over:
/tables conform
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col1:`z`w;col2:3 4))
col1 col2
---------
x 1
y 2
z 3
w 4
/column order different
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1!(3;`z)
`col2`col1!(4;`w)
/non-matching columns
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1`col3!(3;`z;0b)
`col2`col1`col3!(4;`w;1b)
/uj handles any non-conformity
q)(uj/)`a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
col1 col2 col3
--------------
x 1 0
y 2 0
z 3 0
w 4 1
Use:
raze x
Raze is defined as:
Return the items of x joined, collapsing one level of nesting.
The table will not include the key, but if the key is also in each table then no information is lost.
It is easy to see what raze does:
parse "raze d"
,/
`d
As a matter of fact, personally in the past I have used the following command to achieve the same output:
(),/ d

How to query table and sum up certain columns by criteria, but not others?

From a starting table, let's say:
A
B
C
1
1
99
2
2
88
3
3
77
I'm trying to write a query that would result in a table with a different value in column C based on the criteria that when A has value 2, the value for C should be the existing value + the value from C where A is 1. Here's the result:
A
B
C
1
1
99
2
2
187
3
3
77
Unsure if a grouping makes sense here, especially since there might be multiple similar criteria. The closes query I could think of would be
SELECT A, B, C+(SELECT C FROM table1 WHERE A=1 LIMIT 1) FROM table1 WHERE A=2;
but this isn't valid SQL, since subqueries can't be used like this. Any suggestions are welcome, even if they involve somehow altering the structure of the original table.
consider below approach (tested in BigQuery)
select a, b, c +
case a
when 2 then sum(if(a = 1, c, 0)) over()
else 0
end c
from your_table
if applied to sample data in your question - output is
SELECT
A,
B,
CASE
WHEN A=2 THEN C + (SELECT C FROM table WHERE A = 1)
ELSE C
END AS C
FROM
table;

reshaping table based on column values

I was looking at a problem of reshaping a table creating new columns according based on values.
I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns
so I have a table:
df:([]ID:1+til 5;
Group:1 1 2 2 2;
V1:10 + 2 * til 5;
Type_v1:`t1`t2`t1`t1`t2;
V2:3 0N 0N 7 8;
Type_v2:`t2```t3`t3);
ID Group V1 Type_v1 V2 Type_v2
------------------------------
1 1 10 t1 3 t2
2 1 12 t2
3 2 14 t1
4 2 16 t1 7 t3
5 2 18 t2 8 t3
and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3
I did the beginning but I am unable to transform the table and create the new columns.
also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.
df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;
basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
any help and explanations appreciated,
The last piece you're missing is a pivot: https://code.kx.com/q/kb/pivoting-tables/
q)P:exec distinct Type from tr
q)exec P#(Type!Value) by Group:Group from tr
Group| t1 t2 t3
-----| --------
1 | 10 15
2 | 30 18 15
It doesn't quite get you the exact output but pivot is the concept
You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:
https://code.kx.com/q/basics/funsql/
// Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
{[df;n]
// dynamically create cols from 1 to n
cls:`$("v_";"typCol_"),\:/:string 1 + til n;
// functional form of select for each type/value col before joining together
df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
// sum, then pivot
df:0!select sum v by grpCol, typCol from df where typCol <> `;
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
// Type cols seem unnecessary but
// Can be done with another functional select
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
}[df;2]
grpCol t1 typCol_1 t2 typCol_2 t3 typCol_3
1 10 t1 15 t2 0N t3
2 30 t1 18 t2 15 t3
EDIT - More detailed breakdown below:
cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.
,\:/: -> join with each left and each right iterators
https://code.kx.com/q/ref/maps/#each-left-and-each-right
This allows me to join every item on the left ("v_";"typCol_") with every item on the right.
The same could be achieved with cross but you would have to restructure the list with flip and cut
flip n cut `$("v_";"typCol_") cross string 1 + til n
(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
(,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.
https://code.kx.com/q/ref/over/
{?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
// functional select
?[table; where; by; columns]
?[x; (); 0b; `grpCol`v`typCol!`grpCol,y]
This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.
The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values
e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).
There is a useful keyword to help with figuring out functional form -> parse
parse"select Group, Value:V1, Type:Type_v1 from df"
0 ?
1 `df
2 ()
3 0b
4 (`Group`Value`Type)!`Group`V1`Type_v1
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/
It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol
?[table; where; by; columns]
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:
(`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)
It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,
`t1 = (column) `t1
`typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column

Create multiple incrementing columns using with recursive in postgresql

I'm trying to create a table with the following columns:
I want to use a with recursive table to do this. The following code however is giving the following error:
'ERROR: column "b" does not exist'
WITH recursive numbers AS
(
SELECT 1,2,4 AS a, b, c
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
I'm stuck because when I just include one column this works perfectly. Why is there an error for multiple columns?
This appears to be a simple syntax issue: You are aliasing the columns incorrectly. (SELECT 1,2,4 AS a, b, c) is incorrect. Your attempt has 5 columns: 1,2,a,b,c
Break it down to just: Select 1,2,4 as a,b,c and you see the error but Select 1 a,2 b,4 c works fine.
b is unknown in the base select because it is being interpreted as a field name; yet no table exists having that field. Additionally the union would fail as you have 5 fields in the base and 3 in the recursive union.
DEMO: http://rextester.com/IUWJ67486
One can define the columns outside the select making it easier to manage or change names.
WITH recursive numbers (a,b,c) AS
(
SELECT 1,2,4
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
or this approach which aliases the fields internally so the 1st select column's names would be used. (a,b,c) vs somereallylongalias... in union query. It should be noted that not only the name of the column originates from the 1st query in the unioned sets; but also the datatype for the column; which, must match between the two queries.
WITH recursive numbers AS
(
SELECT 1 as a ,2 as b,4 as c
UNION ALL
SELECT a+1 someReallyLongAlias
, b+1 someReallyLongAliasAgain
, c+1 someReallyLongAliasYetAgain
FROM Numbers
WHERE a<5
)
SELECT * FROM numbers;
Lastly, If you truly want to stop at 5 then the where clause should be WHERE a < 5. The image depicts this whereas the query does not; so not sure what your end game is here.

Efficiently finding rows where a < (max(a) for a given b)

I have a table structure like:
a | b
2014-04-12| 3
2014-03-12| 3
2014-02-12| 3
2014-05-12| 4
2014-03-12| 4
2014-04-12| 4
I need output where a is less then the max(a) for a particular b and a is also less then now().
What i have done till now is i do a self join on b use where a < now() having table.a < max(table1.b)
And the output is correct but the cost of the query is very high as the number of rows in my table are quite large. Is there any alternative way of doing this.
My query is:
select a1.a, a1.b
from tab a1
JOIN tab b1
on a1.b=b1.b
where a1.a < now()
group by a1.a, a1.b
having a1.a < max(b1.a);
I think this should be faster than the self join as only a single scan over the table is required:
select a,b
from (
select a,
b,
max(a) over (partition by b) as max_a
from the_table
where a < now()
)
where a < max_a;
If the condition a < now() filters out many rows, then an index on (a,b) will help. If that still leaves many rows, an index on (b,a) might be the better choice to speed up finding the max a for a given. But only an execution plan on your real data will show that