I was looking at a problem of reshaping a table creating new columns according based on values.
I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns
so I have a table:
df:([]ID:1+til 5;
Group:1 1 2 2 2;
V1:10 + 2 * til 5;
Type_v1:`t1`t2`t1`t1`t2;
V2:3 0N 0N 7 8;
Type_v2:`t2```t3`t3);
ID Group V1 Type_v1 V2 Type_v2
------------------------------
1 1 10 t1 3 t2
2 1 12 t2
3 2 14 t1
4 2 16 t1 7 t3
5 2 18 t2 8 t3
and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3
I did the beginning but I am unable to transform the table and create the new columns.
also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.
df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;
basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
any help and explanations appreciated,
The last piece you're missing is a pivot: https://code.kx.com/q/kb/pivoting-tables/
q)P:exec distinct Type from tr
q)exec P#(Type!Value) by Group:Group from tr
Group| t1 t2 t3
-----| --------
1 | 10 15
2 | 30 18 15
It doesn't quite get you the exact output but pivot is the concept
You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:
https://code.kx.com/q/basics/funsql/
// Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
{[df;n]
// dynamically create cols from 1 to n
cls:`$("v_";"typCol_"),\:/:string 1 + til n;
// functional form of select for each type/value col before joining together
df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
// sum, then pivot
df:0!select sum v by grpCol, typCol from df where typCol <> `;
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
// Type cols seem unnecessary but
// Can be done with another functional select
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
}[df;2]
grpCol t1 typCol_1 t2 typCol_2 t3 typCol_3
1 10 t1 15 t2 0N t3
2 30 t1 18 t2 15 t3
EDIT - More detailed breakdown below:
cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.
,\:/: -> join with each left and each right iterators
https://code.kx.com/q/ref/maps/#each-left-and-each-right
This allows me to join every item on the left ("v_";"typCol_") with every item on the right.
The same could be achieved with cross but you would have to restructure the list with flip and cut
flip n cut `$("v_";"typCol_") cross string 1 + til n
(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
(,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.
https://code.kx.com/q/ref/over/
{?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
// functional select
?[table; where; by; columns]
?[x; (); 0b; `grpCol`v`typCol!`grpCol,y]
This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.
The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values
e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).
There is a useful keyword to help with figuring out functional form -> parse
parse"select Group, Value:V1, Type:Type_v1 from df"
0 ?
1 `df
2 ()
3 0b
4 (`Group`Value`Type)!`Group`V1`Type_v1
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/
It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol
?[table; where; by; columns]
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:
(`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)
It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,
`t1 = (column) `t1
`typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column
Related
As per question, I have a dictionary of tables. How do I join the values into a single table?
raze works if the schemas of the tables all conform (aka all columns are the same and in the same order). If they don't conform, a more general option is to union join over:
/tables conform
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col1:`z`w;col2:3 4))
col1 col2
---------
x 1
y 2
z 3
w 4
/column order different
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1!(3;`z)
`col2`col1!(4;`w)
/non-matching columns
q)raze `a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
`col1`col2!(`x;1)
`col1`col2!(`y;2)
`col2`col1`col3!(3;`z;0b)
`col2`col1`col3!(4;`w;1b)
/uj handles any non-conformity
q)(uj/)`a`b!(([]col1:`x`y;col2:1 2);([]col2:3 4;col1:`z`w;col3:01b))
col1 col2 col3
--------------
x 1 0
y 2 0
z 3 0
w 4 1
Use:
raze x
Raze is defined as:
Return the items of x joined, collapsing one level of nesting.
The table will not include the key, but if the key is also in each table then no information is lost.
It is easy to see what raze does:
parse "raze d"
,/
`d
As a matter of fact, personally in the past I have used the following command to achieve the same output:
(),/ d
From a starting table, let's say:
A
B
C
1
1
99
2
2
88
3
3
77
I'm trying to write a query that would result in a table with a different value in column C based on the criteria that when A has value 2, the value for C should be the existing value + the value from C where A is 1. Here's the result:
A
B
C
1
1
99
2
2
187
3
3
77
Unsure if a grouping makes sense here, especially since there might be multiple similar criteria. The closes query I could think of would be
SELECT A, B, C+(SELECT C FROM table1 WHERE A=1 LIMIT 1) FROM table1 WHERE A=2;
but this isn't valid SQL, since subqueries can't be used like this. Any suggestions are welcome, even if they involve somehow altering the structure of the original table.
consider below approach (tested in BigQuery)
select a, b, c +
case a
when 2 then sum(if(a = 1, c, 0)) over()
else 0
end c
from your_table
if applied to sample data in your question - output is
SELECT
A,
B,
CASE
WHEN A=2 THEN C + (SELECT C FROM table WHERE A = 1)
ELSE C
END AS C
FROM
table;
While trying to map some data to a table, I wanted to obtain the ID of a table and its modulo respect the total rows in the same table. For example, given this table:
id
--
1
3
10
12
I would like this result:
id | mod
---+----
1 | 1 <- 1 mod 4
3 | 3 <- 3 mod 4
10 | 2 <- 10 mod 4
12 | 0 <- 12 mod 4
Is there an easy way to achieve this dynamically (as in, not counting the rows on before hand or doing it in an atomic way)?
So far I've tried something like this:
SELECT t1.id, t1.id % COUNT(t1.id) mod FROM tbl t1, tbl t2 GROUP BY t1.id;
This works but you must have the GROUP BY and tbl t2 as otherwise it returns 0 for the mod column which makes sense because I think it works by multiplying the table by itself so each ID gets a full set of the table. I guess for small enough tables this is ok but I can see how this becomes problematic for larger tables.
Edit: Found another hack-ish way:
WITH total AS (
SELECT COUNT(*) cnt FROM tbl
)
SELECT t1.id, t1.id % t2.cnt mod FROM tbl t1, total t2
It similar to the previous query but it "collapses" the multiplication to a single row with the previous count.
You can use COUNT() window function:
SELECT id,
id % COUNT(*) OVER () mod
FROM tbl;
I'm sure that the optimizer is smart enough to calculate the result of the window function only once.
See the demo.
I am trying to join 2 tables like so:
left join (
select t1.createdate, min(f1.createdate) as mindt, f1.status_aft
from new_table t1
left join new_folder f1 on t1.veh_id = f1.veh_id
where f1.createdate > t1.createdate
group by t1.createdate
) h3
on t1.createdate = h3.createdate
and f1.createdate = h3.mindt
But I am getting an error:
ERROR: column "f1.status_aft" must appear in the GROUP BY clause or be used in an aggregate function
This makes sense because I do not group it, my goal is just to take the value that is in that current row when f1.createdate is min.
For example:
A B C
one 10 a
one 15 b
two 20 c
two 25 d
Becomes
A B C
one 10 a
two 20 c
Because a and c was the values when column B were the lowest after grouping it by column A.
I've seen this answer but I still can't apply it to my scenario.
How can I achieve the desired result?
my goal is just to take the value that is in that current row when f1.createdate is min.
If you want just one row, you can order by and limit:
left join (
select t1.t1.createdate, f1.createdate as mindt, f1.status_aft
from new_table t1
left join new_folder f1 on t1.veh_id = f1.veh_id
where f1.createdate > t1.createdate
order by t1.createdate limit 1
) h3
I'm trying to create a table with the following columns:
I want to use a with recursive table to do this. The following code however is giving the following error:
'ERROR: column "b" does not exist'
WITH recursive numbers AS
(
SELECT 1,2,4 AS a, b, c
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
I'm stuck because when I just include one column this works perfectly. Why is there an error for multiple columns?
This appears to be a simple syntax issue: You are aliasing the columns incorrectly. (SELECT 1,2,4 AS a, b, c) is incorrect. Your attempt has 5 columns: 1,2,a,b,c
Break it down to just: Select 1,2,4 as a,b,c and you see the error but Select 1 a,2 b,4 c works fine.
b is unknown in the base select because it is being interpreted as a field name; yet no table exists having that field. Additionally the union would fail as you have 5 fields in the base and 3 in the recursive union.
DEMO: http://rextester.com/IUWJ67486
One can define the columns outside the select making it easier to manage or change names.
WITH recursive numbers (a,b,c) AS
(
SELECT 1,2,4
UNION ALL
SELECT a+1, b+1, c+1
FROM Numbers
WHERE a + 1 <= 10
)
SELECT * FROM numbers;
or this approach which aliases the fields internally so the 1st select column's names would be used. (a,b,c) vs somereallylongalias... in union query. It should be noted that not only the name of the column originates from the 1st query in the unioned sets; but also the datatype for the column; which, must match between the two queries.
WITH recursive numbers AS
(
SELECT 1 as a ,2 as b,4 as c
UNION ALL
SELECT a+1 someReallyLongAlias
, b+1 someReallyLongAliasAgain
, c+1 someReallyLongAliasYetAgain
FROM Numbers
WHERE a<5
)
SELECT * FROM numbers;
Lastly, If you truly want to stop at 5 then the where clause should be WHERE a < 5. The image depicts this whereas the query does not; so not sure what your end game is here.