KDB/Q: how to join and fill null with 0 - kdb

I am joining 2 tables. How do I replace NULL with 0 a column from one of the table?
My code to join
newTable: table1 lj xkey `date`sym xkey table2
I am aware that 0^ helps you to do this, but I dont know how to apply here

In future I recommend that you show examples of the 2 tables you have and the expected outcome you would like because it is slightly difficult to know but I think this might be what you want.
First in your code you use xkey twice so it will throw an error. Change it to be:
newTable: table1 lj `date`sym xkey table2
Then for the updating of null values with a column from another tbl you could do:
q)tbl:([]date:.z.d;sym:10?`abc`xyz;data:10?8 2 0n)
q)tbl
date sym data
-------------------
2020.12.10 xyz 8
2020.12.10 abc 8
2020.12.10 abc 8
2020.12.10 abc
2020.12.10 abc
2020.12.10 xyz 2
2020.12.10 abc 2
2020.12.10 xyz
2020.12.10 xyz
2020.12.10 abc 2
q)tbl2:([date:.z.d;sym:`abc`xyz];data2:2?100)
q)tbl2
date sym| data2
--------------| -----
2020.12.10 abc| 23
2020.12.10 xyz| 46
q)select date,sym,data:data2^data from tbl lj `date`sym xkey tbl2 //Replace null values of data with data2.
date sym data
-------------------
2020.12.10 xyz 8
2020.12.10 abc 8
2020.12.10 abc 8
2020.12.10 abc 23
2020.12.10 abc 23
2020.12.10 xyz 2
2020.12.10 abc 2
2020.12.10 xyz 46
2020.12.10 xyz 46
2020.12.10 abc 2
So, it's

Use within an an update statement, for example:
q)newTable:([]column1:(1;0Nj;2;0Nj))
q)update 0^column1 from newTable
column1
-------
1
0
2
0
Or functional form:
q)newTable:([]column1:(1;0Nj;2;0Nj);column2:(1;2;3;0Nj))
q)parse"update 0^column1 from newTable"
!
`newTable
()
0b
(,`column1)!,(^;0;`column1)
q)![newTable;();0b;raze{enlist[x]!enlist(^;0;x)}each `column1`column2]
column1 column2
---------------
1 1
0 2
2 3
0 0

Related

Get distinct values in Pyspark and if duplicate value then should be placed in another column

Input Table:
prod
acct
acctno
newcinsfx
John
A01
1
89
John
A01
2
90
John
A01
2
92
Mary
A02
1
92
Mary
A02
3
81
Desired output table:
prod
acct
newcinsfx1
newcinsfx2
John
A01
89
John
A01
90
92
Mary
A02
92
Mary
A02
81
I tried to do it by distinct function.
df.select('prod',"acctno").distinct()
df.show()

Remove table duplicates under certain conditions

I have a table like below that shows me some pnl by instrument (code) for some shifts, maturity, etc.
Instrument 123 appears two times (2 sets of shift, booknumber, insmat but different pnl). I would like to clean the table to only keep the first set (3 first rows).
> code | shift | pnl | booknumber | insmat
123 -20% 5 1234 2021.01.29
123 -0% 7 1234 2021.01.29
123 +20% 9 1234 2021.01.29
123 -20% 4 1234 2021.01.29
123 -0% 6 1234 2021.01.29
123 +20% 8 1234 2021.01.29
456 -20% 1 1234 2021.01.29
456 -0% 2 1234 2021.01.29
456 +20% 3 1234 2021.01.29
If there were no shifts involved I would do something like this:
select first code, first pnl, first booknumber, first insmat by code from t
Would love to hear if you have a solution!
Thanks!
If the shift pattern is consistently 3 shifts, you could use
q)select from t where 0=i mod 3
code shift pnl booknumber insmat
------------------------------------
123 20 5 1234 2021.01.29
123 20 4 1234 2021.01.29
456 -20 1 1234 2021.01.29
Alternative solution with an fby
q)select from t where shift=(first;shift)fby code
code shift pnl booknumber insmat
------------------------------------
123 20 5 1234 2021.01.29
123 20 4 1234 2021.01.29
456 -20 1 1234 2021.01.29
This will only work if the first shift value is unique within the shift pattern however.

Unpivot data in PostgreSQL

I have a table in PostgreSQL with the below values,
empid hyderabad bangalore mumbai chennai
1 20 30 40 50
2 10 20 30 40
And my output should be like below
empid city nos
1 hyderabad 20
1 bangalore 30
1 mumbai 40
1 chennai 50
2 hyderabad 10
2 bangalore 20
2 mumbai 30
2 chennai 40
How can I do this unpivot in PostgreSQL?
You can use a lateral join:
select t.empid, x.city, x.nos
from the_table t
cross join lateral (
values
('hyderabad', t.hyderabad),
('bangalore', t.bangalore),
('mumbai', t.mumbai),
('chennai', t.chennai)
) as x(city, nos)
order by t.empid, x.city;
Or this one: simpler to read- and real plain SQL ...
WITH
input(empid,hyderabad,bangalore,mumbai,chennai) AS (
SELECT 1,20,30,40,50
UNION ALL SELECT 2,10,20,30,40
)
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
)
SELECT
empid
, CASE i
WHEN 1 THEN 'hyderabad'
WHEN 2 THEN 'bangalore'
WHEN 3 THEN 'mumbai'
WHEN 4 THEN 'chennai'
ELSE 'unknown'
END AS city
, CASE i
WHEN 1 THEN hyderabad
WHEN 2 THEN bangalore
WHEN 3 THEN mumbai
WHEN 4 THEN chennai
ELSE NULL::INT
END AS city
FROM input CROSS JOIN i
ORDER BY empid,i;
-- out empid | city | city
-- out -------+-----------+------
-- out 1 | hyderabad | 20
-- out 1 | bangalore | 30
-- out 1 | mumbai | 40
-- out 1 | chennai | 50
-- out 2 | hyderabad | 10
-- out 2 | bangalore | 20
-- out 2 | mumbai | 30
-- out 2 | chennai | 40

Add unique rows for each group when similar group repeats after certain rows

Hi Can anyone help me please to get unique group number?
I need to give unique rows for each group even when same group repeats after some groups.
I have following data:
id version product startdate enddate
123 0 2443 2010/09/01 2011/01/02
123 1 131 2011/01/03 2011/03/09
123 2 131 2011/08/10 2012/09/10
123 3 3009 2012/09/11 2014/03/31
123 4 668 2014/04/01 2014/04/30
123 5 668 2014/05/01 2016/01/01
123 6 668 2016/01/02 2017/09/08
123 7 131 2017/09/09 2017/10/10
123 8 131 2018/10/11 2019/01/01
123 9 550 2019/01/02 2099/01/01
select *,
dense_rank()over(partition by id order by id,product)
from table
Expected results:
id version product startdate enddate count
123 0 2443 2010/09/01 2011/01/02 1
123 1 131 2011/01/03 2011/03/09 2
123 2 131 2011/08/10 2012/09/10 2
123 3 3009 2012/09/11 2014/03/31 3
123 4 668 2014/04/01 2014/04/30 4
123 5 668 2014/05/01 2016/01/01 4
123 6 668 2016/01/02 2017/09/08 4
123 7 131 2017/09/09 2017/10/10 5
123 8 131 2018/10/11 2019/01/01 5
123 9 550 2019/01/02 2099/01/01 6
Try the following
SELECT
id,version,product,startdate,enddate,
1+SUM(v)OVER(PARTITION BY id ORDER BY version) n
FROM
(
SELECT
*,
IIF(LAG(product)OVER(PARTITION BY id ORDER BY version)<>product,1,0) v
FROM TestTable
) q

Select from table removing similar rows - PostgreSQL

There is a table with document revisions and authors. Looks like this:
doc_id rev_id rev_date editor title,content so on....
123 1 2016-01-01 03:20 Bill ......
123 2 2016-01-01 03:40 Bill
123 3 2016-01-01 03:50 Bill
123 4 2016-01-01 04:10 Bill
123 5 2016-01-01 08:40 Alice
123 6 2016-01-01 08:41 Alice
123 7 2016-01-01 09:00 Bill
123 8 2016-01-01 10:40 Cate
942 9 2016-01-01 11:10 Alice
942 10 2016-01-01 11:15 Bill
942 15 2016-01-01 11:17 Bill
I need to find out moments when document was transferred to another editor - only first rows of every edition series.
Like so:
doc_id rev_id rev_date editor title,content so on....
123 1 2016-01-01 03:20 Bill ......
123 5 2016-01-01 08:40 Alice
123 7 2016-01-01 09:00 Bill
123 8 2016-01-01 10:40 Cate
942 9 2016-01-01 11:10 Alice
942 10 2016-01-01 11:15 Bill
If I use DISTINCT ON (doc_id, editor) it resorts a table and I see only one per doc and editor, that is incorrect.
Of course I can dump all and filter with shell tools like awk | sort | uniq. But it is not good for big tables.
Window functions like FIRST_ROW do not give much, because I cannot partition by doc_id, editor not to mess all them.
How to do better?
Thank you.
You can use lag() to get the previous value, and then a simple comparison:
select t.*
from (select t.*,
lag(editor) over (partition by doc_id order by rev_date) as prev_editor
from t
) t
where prev_editor is null or prev_editor <> editor;