How to query table and sum up certain columns by criteria, but not others? - postgresql

From a starting table, let's say:
A
B
C
1
1
99
2
2
88
3
3
77
I'm trying to write a query that would result in a table with a different value in column C based on the criteria that when A has value 2, the value for C should be the existing value + the value from C where A is 1. Here's the result:
A
B
C
1
1
99
2
2
187
3
3
77
Unsure if a grouping makes sense here, especially since there might be multiple similar criteria. The closes query I could think of would be
SELECT A, B, C+(SELECT C FROM table1 WHERE A=1 LIMIT 1) FROM table1 WHERE A=2;
but this isn't valid SQL, since subqueries can't be used like this. Any suggestions are welcome, even if they involve somehow altering the structure of the original table.

consider below approach (tested in BigQuery)
select a, b, c +
case a
when 2 then sum(if(a = 1, c, 0)) over()
else 0
end c
from your_table
if applied to sample data in your question - output is

SELECT
A,
B,
CASE
WHEN A=2 THEN C + (SELECT C FROM table WHERE A = 1)
ELSE C
END AS C
FROM
table;

Related

KDB: Convert a dictionary of tables into a table?

As per question, I have a dictionary of tables. How do I join the values into a single table?
raze works if the schemas of the tables all conform (aka all columns are the same and in the same order). If they don't conform, a more general option is to union join over:
/tables conform
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col1:`z`w;col2:3 4))
col1 col2
---------
x 1
y 2
z 3
w 4
/column order different
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1!(3;`z)
`col2`col1!(4;`w)
/non-matching columns
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1`col3!(3;`z;0b)
`col2`col1`col3!(4;`w;1b)
/uj handles any non-conformity
q)(uj/)`a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
col1 col2 col3
--------------
x 1 0
y 2 0
z 3 0
w 4 1
Use:
raze x
Raze is defined as:
Return the items of x joined, collapsing one level of nesting.
The table will not include the key, but if the key is also in each table then no information is lost.
It is easy to see what raze does:
parse "raze d"
,/
`d
As a matter of fact, personally in the past I have used the following command to achieve the same output:
(),/ d

KDB get only the rows present only in one table

I have two tables table1 and table2. Both has 4 columns with same column names. table1 has 50 rows and table2 has 100 rows. How can I get only those rows from table2, which are not there in table1. I tried performing left join, but I am not able to do that, since we can't do left join using all columns.
Since tables are lists of dictionaries, you could use the except keyword to exclude all rows from table2 which are found in table1.
For example:
q)table1:([]a:til 3;b:3#.Q.a;c:3#.Q.A)
q)table1
a b c
-----
0 a A
1 b B
2 c C
q)table2:([]a:til 6;b:6#.Q.a;c:6#.Q.A)
q)table2
a b c
-----
0 a A
1 b B
2 c C
3 d D
4 e E
5 f F
q)table2 except table1
a b c
-----
3 d D
4 e E
5 f F

reshaping table based on column values

I was looking at a problem of reshaping a table creating new columns according based on values.
I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns
so I have a table:
df:([]ID:1+til 5;
Group:1 1 2 2 2;
V1:10 + 2 * til 5;
Type_v1:`t1`t2`t1`t1`t2;
V2:3 0N 0N 7 8;
Type_v2:`t2```t3`t3);
ID Group V1 Type_v1 V2 Type_v2
------------------------------
1 1 10 t1 3 t2
2 1 12 t2
3 2 14 t1
4 2 16 t1 7 t3
5 2 18 t2 8 t3
and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3
I did the beginning but I am unable to transform the table and create the new columns.
also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.
df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;
basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
any help and explanations appreciated,
The last piece you're missing is a pivot: https://code.kx.com/q/kb/pivoting-tables/
q)P:exec distinct Type from tr
q)exec P#(Type!Value) by Group:Group from tr
Group| t1 t2 t3
-----| --------
1 | 10 15
2 | 30 18 15
It doesn't quite get you the exact output but pivot is the concept
You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:
https://code.kx.com/q/basics/funsql/
// Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
{[df;n]
// dynamically create cols from 1 to n
cls:`$("v_";"typCol_"),\:/:string 1 + til n;
// functional form of select for each type/value col before joining together
df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
// sum, then pivot
df:0!select sum v by grpCol, typCol from df where typCol <> `;
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
// Type cols seem unnecessary but
// Can be done with another functional select
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
}[df;2]
grpCol t1 typCol_1 t2 typCol_2 t3 typCol_3
1 10 t1 15 t2 0N t3
2 30 t1 18 t2 15 t3
EDIT - More detailed breakdown below:
cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.
,\:/: -> join with each left and each right iterators
https://code.kx.com/q/ref/maps/#each-left-and-each-right
This allows me to join every item on the left ("v_";"typCol_") with every item on the right.
The same could be achieved with cross but you would have to restructure the list with flip and cut
flip n cut `$("v_";"typCol_") cross string 1 + til n
(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
(,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.
https://code.kx.com/q/ref/over/
{?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
// functional select
?[table; where; by; columns]
?[x; (); 0b; `grpCol`v`typCol!`grpCol,y]
This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.
The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values
e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).
There is a useful keyword to help with figuring out functional form -> parse
parse"select Group, Value:V1, Type:Type_v1 from df"
0 ?
1 `df
2 ()
3 0b
4 (`Group`Value`Type)!`Group`V1`Type_v1
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/
It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol
?[table; where; by; columns]
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:
(`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)
It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,
`t1 = (column) `t1
`typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column

Calculation in a table with a value from another table

Assumed we have two tables:
Table A:
a b c
x 1 null
x 2 null
y 3 null
Table B:
a b
x 5
y 10
I want to update Table A by multiplication of TableA.b with TableB.b and writing it into TableA.c. The value of TableB should be selected by the condition TableA.a = TableB.a. Thus my updated TableA should look like this:
Table A:
a b c
x 1 5
x 2 10
y 3 30
I thought to do a join of both tables before, but im not sure. What do you think is the easiest and best solution?
In Postgres, you can use the update ... set ... from ... where syntax.
Consider:
update tablea ta
set c = ta.b * tb.b
from tableb tb
where tb.a = ta.a

Create multiple incrementing columns using with recursive in postgresql

I'm trying to create a table with the following columns:
I want to use a with recursive table to do this. The following code however is giving the following error:
'ERROR: column "b" does not exist'
WITH recursive numbers AS
(
SELECT 1,2,4 AS a, b, c
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
I'm stuck because when I just include one column this works perfectly. Why is there an error for multiple columns?
This appears to be a simple syntax issue: You are aliasing the columns incorrectly. (SELECT 1,2,4 AS a, b, c) is incorrect. Your attempt has 5 columns: 1,2,a,b,c
Break it down to just: Select 1,2,4 as a,b,c and you see the error but Select 1 a,2 b,4 c works fine.
b is unknown in the base select because it is being interpreted as a field name; yet no table exists having that field. Additionally the union would fail as you have 5 fields in the base and 3 in the recursive union.
DEMO: http://rextester.com/IUWJ67486
One can define the columns outside the select making it easier to manage or change names.
WITH recursive numbers (a,b,c) AS
(
SELECT 1,2,4
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
or this approach which aliases the fields internally so the 1st select column's names would be used. (a,b,c) vs somereallylongalias... in union query. It should be noted that not only the name of the column originates from the 1st query in the unioned sets; but also the datatype for the column; which, must match between the two queries.
WITH recursive numbers AS
(
SELECT 1 as a ,2 as b,4 as c
UNION ALL
SELECT a+1 someReallyLongAlias
, b+1 someReallyLongAliasAgain
, c+1 someReallyLongAliasYetAgain
FROM Numbers
WHERE a<5
)
SELECT * FROM numbers;
Lastly, If you truly want to stop at 5 then the where clause should be WHERE a < 5. The image depicts this whereas the query does not; so not sure what your end game is here.