I have a partitioned table, similar to below table:
q)t:([]date:3#2019.01.01; a:1 2 3; a_test:2 3 4; b_test:3 4 5; c: 6 7 8);
date a a_test b_test c
----------------------------
2019.01.01 1 2 3 6
2019.01.01 2 3 4 7
2019.01.01 3 4 5 8
Now, I want to fetch date column and all columns have names with suffix "_test" from table t.
Expected output:
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In my original table, there are more than 100 columns with name having _test so below is not a practical solution in this case.
q)select date, a_test, b_test from t where date=2019.01.01
I tried various options like below, but of no use:
q)delete all except date, *_test from select from t where date=2019.01.01
If the columns you are selecting are variable then you should use a functional qSQL statement to perform the query. The following can be used in your case
q)query:{[tab;dt;c]?[tab;enlist (=;`date;dt);0b;(`date,c)!`date,c]}
q)query[t;2019.01.01;cols[t] where cols[t] like "*_*"]
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In order to craft a particular functional statement, you can parse your query, putting dummy columns in place if you aren't sure what they should be
q)parse "select date,c1,c2 from tab where date=dt"
?
`tab
,,(=;`date;`dt)
0b
`date`c1`c2!`date`c1`c2
A functional select is probably the best way to go here if you require adding further filters.
?[`t;();0b;{x!x}`date,exec c from meta t where c like "*_test"]
The functional form of any select quesry can be obtained by using the -5! operator on any SQL style statement.
In the example below I have created a table with 20 fields, each one beginning with either a or b.
I then use the functional form to define which fields I want.
q)tab:{[x] enlist x!count[x]#0}`$"_" sv ' raze string `a`b,/:\:til 10
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"b*"]
b_0 b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"a*"]
a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q)-5!" select a,b from c"
?
`c
()
0b
`a`b!`a`b
Alternatively, if I don't require any filtering I can use the # operator as in below:
{[x;s] (cols[x] where cols[x] like s)#x}[ tab;"a*"]
Related
What are the dictionaries with a single value and multiple keys stands for?
What are their purposes?
I've accidentally created one, but can not do anything with it:
q)type (`a`b`c)!(`d)
99h
q)((`a`b`c)!(`d))[`a]
'par
That special form usually denotes the flip of a partitioned table, where the keys represent the column names and the value represents the table name:
q)/load a database with partitioned table part_tab
q)flip part_tab
`ncej`jogn`ciha`hkpb`aeaj`blmj`ooei`jgjm`cflm`bpmc!`part_tab
This dictionary is not intended to be looked up in the usual manner and not in the way that you've attempted.
It would be completely ill-advised but it is possible to restrict columns of a partitioned table by manipulating this dictionary:
q)select from part_tab where date=2020.01.02
date ncej jogn ciha hkpb aeaj blmj ooei jgjm cflm bpmc
------------------------------------------------------------
2020.01.02 0 0 0 0 0 0 0 0 0 0
2020.01.02 1 1 1 1 1 1 1 1 1 1
2020.01.02 2 2 2 2 2 2 2 2 2 2
2020.01.02 3 3 3 3 3 3 3 3 3 3
...
q)part_tab:flip`ncej`jogn`ciha!`part_tab
q)select from part_tab where date=2020.01.02
date ncej jogn ciha
-------------------------
2020.01.02 0 0 0
2020.01.02 1 1 1
2020.01.02 2 2 2
...
Again don't try this on any large/production tables, it's an undocumented quirk.
Splay table have a similar dictionary when flipped:
q)flip splay
`ncej`jogn`ciha`hkpb`aeaj`blmj`ooei`jgjm`cflm`bpmc!`:splay/
The difference being that the table name has a "/" at the end and is hsym'd. This is how .Q.qp determines if a table is partitioned or splayed.
I have a table like this:
type code desc store Sales/Day Stock
-----------------------------------------------
1 AA1 abc 101 3 6
1 AA2 abd 101 4 0
1 AA3 abf 101 4 3
2 BA1 bba 101 5 1
2 BA2 bbc 101 2 1
1 AA1 abc 102 1 4
1 AA2 abd 102 2 0
2 BA1 bba 102 4 2
2 BA2 bbc 102 5 5
etc.
How I can show the result table like this:
type code desc Store 101 Store 102
Sales/Day | Stock Sales/Day | Stock
--------------------------------------------------------------
1 AA1 abc 3 6 1 4
1 AA2 abd 4 0 2 0
1 AA3 abf 4 3 0 0
2 BA1 bba 5 1 4 2
2 BA2 bbc 2 1 5 5
etc.
Note:
Colspan is only display.
demo:db<>fiddle
First way: FILTER
SELECT
type,
code,
"desc",
COALESCE(SUM(sales_day) FILTER (WHERE store = 101)) as sales_day_101,
COALESCE(SUM(stock) FILTER (WHERE store = 101), 0) as stock_101,
COALESCE(SUM(sales_day) FILTER (WHERE store = 102), 0) as sales_day_102,
COALESCE(SUM(stock) FILTER (WHERE store = 102), 0) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
Aggregating your values. I took SUM but in your case with distinct rows many other aggregate functions would do it. FILTER allows you to aggregate only one store.
The COALESCE is to avoid NULL values if no values are present for one aggregation (like AA3 in store 102).
Second way, CASE WHEN
SELECT
type,
code,
"desc",
SUM(CASE WHEN store = 101 THEN sales_day ELSE 0 END) as sales_day_101,
SUM(CASE WHEN store = 101 THEN stock ELSE 0 END) as stock_101,
SUM(CASE WHEN store = 102 THEN sales_day ELSE 0 END) as sales_day_102,
SUM(CASE WHEN store = 102 THEN stock ELSE 0 END) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
The idea is the same, but the newer FILTER function is replace by the more common CASE clause.
Notice that "desc" is a reserved word in Postgres. So I strictly recommend to rename your column.
I am actually new to pyspark and i am trying to do some data manipulations with it.
I have a DataFrame like below example:
Trxn Cust_ID Group
3370 A 1
8809 C 2
3525 B 3
8260 A 3
6349 B 3
3359 C 3
3701 NULL 3
5572 NULL 2
2580 A 1
In this DF, Trxn's are unique and the cust_id's can be repetitive and every cust_id belongs to some group. I need a Final Dataframe with the new group column names like array(Group_1, Group_2.. so on) where I do have a count of cust_id's belong to each group. Below is the output example:
Trxn Cust_ID Group Group_1 Group_2 Group_3
3370 A 1 2 0 1
8809 C 2 0 1 1
3525 B 3 0 0 2
8260 A 3 2 0 1
6349 B 3 0 0 2
3359 C 3 0 1 1
3701 NULL 3 0 1 1
5572 NULL 2 0 1 1
2580 A 1 2 0 1
Can someone let me know how to get this exact output in pyspark? Any help or hints would be greatly appreciated.
Seems like you are trying to do pivot here.
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
I have a function that does something with a date and a function that takes two arguments to perform a calculation. For now let's assume that they look as follows:
d:{[x] :x.hh}
f:{[x;y] :x+y}
Now I want to use function f in a query as follows:
select f each (columnOne,d[columnTwo]) from myTable
Hence, I first want to convert one column to the corresponding numbers using function d. Then, using both columnOne and the output of d[columnTwo], I want to calculate the outcome of f.
Clearly, the approach above does not work, as it fails with a 'rank error.
I've also tried select f ./: (columnOne,'d[columnTwo]) from myTable, which also doesn't work.
How do I do this? Note that I need to input columnOne and columnTwo into f such that the corresponding rows still match. E.g. input row 1 of columnOne and row 1 of columnTwo simultaneously into f.
I've also tried select f ./: (columnOne,'d[columnTwo]) from myTable, which also doesn't work.
You're very close with that code. The issue is the d function, in particular the x.hh within function d - the .hh notation doesn't work in this context, and you will need to do `hh$x instead, so d becomes:
d:{[x] :`hh$x}
So making only this change to the above code, we get:
q)d:{[x] :`hh$x}
q)f:{[x;y] :x+y}
q)myTable:([] columnOne:10?5; columnTwo:10?.z.t);
q)update res:f ./: (columnOne,'d[columnTwo]) from myTable
columnOne columnTwo res
--------------------------
1 21:10:45.900 22
0 20:23:25.800 20
2 19:03:52.074 21
4 00:29:38.945 4
1 04:30:47.898 5
2 04:07:38.923 6
0 06:22:45.093 6
1 19:06:46.591 20
1 10:07:47.382 11
2 00:45:40.134 2
(I've changed select to update so you can see other columns in result table)
Other syntax to achieve the same:
q)update res:f'[columnOne;d columnTwo] from myTable
columnOne columnTwo res
--------------------------
1 21:10:45.900 22
0 20:23:25.800 20
2 19:03:52.074 21
4 00:29:38.945 4
1 04:30:47.898 5
2 04:07:38.923 6
0 06:22:45.093 6
1 19:06:46.591 20
1 10:07:47.382 11
2 00:45:40.134 2
Only other note worthy point - in the above example, function d is vectorised (works with vector arg), if this wasn't the case, you'd need to change d[columnTwo] to d each columnTwo (or d'[columnTwo])
This would then result in one of the following queries:
select res:f'[columnOne;d'[columnTwo]] from myTable
select res:f ./: (columnOne,'d each columnTwo) from myTable
select res:f ./: (columnOne,'d'[columnTwo]) from myTable
I would like a query that will show a sum of columns with a default value for missing data. For example assume I have a table as follows:
type_lookup:
id name
-----------
1 self
2 manager
3 peer
And a table as follows
data:
id type_lookup_id value
--------------------------
1 1 1
2 1 4
3 2 9
4 2 1
5 2 9
6 1 5
7 2 6
8 1 2
9 1 1
After running a query I would like a result set as follows:
type_lookup_id value
----------------------
1 13
2 25
3 0
I would like all rows in type_lookup table to be included in the result set - even if they don't appear in the data table.
It's a bit hard to read your data layout, but something like the following should do the trick:
SELECT tl.type_lookup_id, tl.name, sum(da.type_lookup_id) how_much
from type_lookup tl
left outer join data da
on da.type_lookup_id = tl.type_lookup_id
group by tl.type_lookup_id, tl.name
order by tl.type_lookup_id
[EDIT]
...subsequently edited by changing count() to sum().