Remove table duplicates under certain conditions - kdb

I have a table like below that shows me some pnl by instrument (code) for some shifts, maturity, etc.
Instrument 123 appears two times (2 sets of shift, booknumber, insmat but different pnl). I would like to clean the table to only keep the first set (3 first rows).
> code | shift | pnl | booknumber | insmat
123 -20% 5 1234 2021.01.29
123 -0% 7 1234 2021.01.29
123 +20% 9 1234 2021.01.29
123 -20% 4 1234 2021.01.29
123 -0% 6 1234 2021.01.29
123 +20% 8 1234 2021.01.29
456 -20% 1 1234 2021.01.29
456 -0% 2 1234 2021.01.29
456 +20% 3 1234 2021.01.29
If there were no shifts involved I would do something like this:
select first code, first pnl, first booknumber, first insmat by code from t
Would love to hear if you have a solution!
Thanks!

If the shift pattern is consistently 3 shifts, you could use
q)select from t where 0=i mod 3
code shift pnl booknumber insmat
------------------------------------
123 20 5 1234 2021.01.29
123 20 4 1234 2021.01.29
456 -20 1 1234 2021.01.29
Alternative solution with an fby
q)select from t where shift=(first;shift)fby code
code shift pnl booknumber insmat
------------------------------------
123 20 5 1234 2021.01.29
123 20 4 1234 2021.01.29
456 -20 1 1234 2021.01.29
This will only work if the first shift value is unique within the shift pattern however.

Related

Get distinct values in Pyspark and if duplicate value then should be placed in another column

Input Table:
prod
acct
acctno
newcinsfx
John
A01
1
89
John
A01
2
90
John
A01
2
92
Mary
A02
1
92
Mary
A02
3
81
Desired output table:
prod
acct
newcinsfx1
newcinsfx2
John
A01
89
John
A01
90
92
Mary
A02
92
Mary
A02
81
I tried to do it by distinct function.
df.select('prod',"acctno").distinct()
df.show()

KDB/Q: how to join and fill null with 0

I am joining 2 tables. How do I replace NULL with 0 a column from one of the table?
My code to join
newTable: table1 lj xkey `date`sym xkey table2
I am aware that 0^ helps you to do this, but I dont know how to apply here
In future I recommend that you show examples of the 2 tables you have and the expected outcome you would like because it is slightly difficult to know but I think this might be what you want.
First in your code you use xkey twice so it will throw an error. Change it to be:
newTable: table1 lj `date`sym xkey table2
Then for the updating of null values with a column from another tbl you could do:
q)tbl:([]date:.z.d;sym:10?`abc`xyz;data:10?8 2 0n)
q)tbl
date sym data
-------------------
2020.12.10 xyz 8
2020.12.10 abc 8
2020.12.10 abc 8
2020.12.10 abc
2020.12.10 abc
2020.12.10 xyz 2
2020.12.10 abc 2
2020.12.10 xyz
2020.12.10 xyz
2020.12.10 abc 2
q)tbl2:([date:.z.d;sym:`abc`xyz];data2:2?100)
q)tbl2
date sym| data2
--------------| -----
2020.12.10 abc| 23
2020.12.10 xyz| 46
q)select date,sym,data:data2^data from tbl lj `date`sym xkey tbl2 //Replace null values of data with data2.
date sym data
-------------------
2020.12.10 xyz 8
2020.12.10 abc 8
2020.12.10 abc 8
2020.12.10 abc 23
2020.12.10 abc 23
2020.12.10 xyz 2
2020.12.10 abc 2
2020.12.10 xyz 46
2020.12.10 xyz 46
2020.12.10 abc 2
So, it's
Use within an an update statement, for example:
q)newTable:([]column1:(1;0Nj;2;0Nj))
q)update 0^column1 from newTable
column1
-------
1
0
2
0
Or functional form:
q)newTable:([]column1:(1;0Nj;2;0Nj);column2:(1;2;3;0Nj))
q)parse"update 0^column1 from newTable"
!
`newTable
()
0b
(,`column1)!,(^;0;`column1)
q)![newTable;();0b;raze{enlist[x]!enlist(^;0;x)}each `column1`column2]
column1 column2
---------------
1 1
0 2
2 3
0 0

KDB - Referencing functions from a table

I am new to kdb and researching it for a use case to generate time series data using a table of various function inputs. Each row of the table consists of function inputs keyed by an id and segment and will call one function per row. I have figured out how to identify which function albeit using brute force nested conditions.
My question is 2 part
How does one employ kicking off the execution of these functions?
Once the time series data is generated for each id and segment, how best can the the output be compiled into a singular table (sample output noted below - I have thought about one table for each id and then compile in two steps which would work as well but we'll have thousands of ids)
Below is a sample table and some conditions to add meta data including which function to apply
//Create sample table and add columns to identify unknown and desired function
t:([id:`AAA`AAA`AAA`BBB`CCC;seg:1 2 3 1 1];aa: 1500 0n 400 40 900;bb:0n 200 30 40 0n;cc: .40 .25 0n 0n .35)
t: update Uknown:?[0N = aa;`aa;?[0N = bb;`bb;?[0N = cc;`cc;`UNK]]] from t
t: update Call_Function:?[0N = aa;`Solveaa;?[0N = bb;`Solvebb;?[0N = cc;`Solvecc;`NoFunction]]] from t
A sample function below uses the inputs from table t to generate time series data (limited to 5 periods for example here) and test using X
//dummy function to generate output for first 5 time periods
Solvebb:{[aa;cc]
(aa%cc)*(1-exp(neg cc*1+til 5))
}
//test the function as an example for dummy output in result table below
x: flip enlist Solvebb[1500;.40] //sample output for AAA seg1 from t for example
The result would ideally be a sample table similar to below
t2: `id`seg xkey ("SIIIS";enlist",") 0:`:./Data/sampleOutput.csv
id seg| seg_idx tot_idx result
-------| ------------------------
AAA 1 | 1 1 1,236.30
AAA 1 | 2 2 2,065.02
AAA 1 | 3 3 2,620.52
AAA 1 | 4 4 2,992.89
AAA 1 | 5 5 3,242.49
AAA 2 | 1 6
AAA 2 | 2 7
AAA 2 | 3 8
AAA 2 | 4 9
AAA 2 | 5 10
AAA 3 | 1 11
AAA 3 | 2 12
AAA 3 | 3 13
AAA 3 | 4 14
AAA 3 | 5 15
BBB 1 | 1 1
BBB 1 | 2 2
BBB 1 | 3 3
BBB 1 | 4 4
BBB 1 | 5 5
..
It's difficult without more details, but something like the following may help.
First, it may be easier to define Solvebb so that it can take 3 inputs and simple ignores the middle one:
q)Solvebb:{[aa;bb;cc](aa%cc)*(1-exp(neg cc*1+til 5))}
And adding dummy functions for the other two in your table (NB. it's important for the use of ungroup later that the output of these functions are lists):
q)Solveaa:{[aa;bb;cc] (bb+cc;bb*cc)}
q)Solvecc:{[aa;bb;cc] (aa+bb;aa*bb)}
You can apply each call each function on all three vectors of input with:
q)update result:first[Call_Function]'[aa;bb;cc] by Call_Function from t
id seg| aa bb cc Uknown Call_Function result
-------| -------------------------------------------------------------------------------
AAA 1 | 1500 0.4 bb Solvebb 1236.3 2065.016 2620.522 2992.888 3242.493
AAA 2 | 200 0.25 aa Solveaa 200.25 50
AAA 3 | 400 30 cc Solvecc 430 12000f
BBB 1 | 40 40 cc Solvecc 80 1600f
CCC 1 | 900 0.35 bb Solvebb 759.3735 1294.495 1671.589 1937.322 2124.581
and you can unravel this table by applying the ungroup function
q)ungroup update result:first[Call_Function]'[aa;bb;cc] by Call_Function from t
id seg aa bb cc Uknown Call_Function result
---------------------------------------------------
AAA 1 1500 0.4 bb Solvebb 1236.3
AAA 1 1500 0.4 bb Solvebb 2065.016
AAA 1 1500 0.4 bb Solvebb 2620.522
AAA 1 1500 0.4 bb Solvebb 2992.888
AAA 1 1500 0.4 bb Solvebb 3242.493
AAA 2 200 0.25 aa Solveaa 200.25
AAA 2 200 0.25 aa Solveaa 50
AAA 3 400 30 cc Solvecc 430
AAA 3 400 30 cc Solvecc 12000
BBB 1 40 40 cc Solvecc 80
BBB 1 40 40 cc Solvecc 1600
CCC 1 900 0.35 bb Solvebb 759.3735
CCC 1 900 0.35 bb Solvebb 1294.495
CCC 1 900 0.35 bb Solvebb 1671.589
CCC 1 900 0.35 bb Solvebb 1937.322
CCC 1 900 0.35 bb Solvebb 2124.581

Add unique rows for each group when similar group repeats after certain rows

Hi Can anyone help me please to get unique group number?
I need to give unique rows for each group even when same group repeats after some groups.
I have following data:
id version product startdate enddate
123 0 2443 2010/09/01 2011/01/02
123 1 131 2011/01/03 2011/03/09
123 2 131 2011/08/10 2012/09/10
123 3 3009 2012/09/11 2014/03/31
123 4 668 2014/04/01 2014/04/30
123 5 668 2014/05/01 2016/01/01
123 6 668 2016/01/02 2017/09/08
123 7 131 2017/09/09 2017/10/10
123 8 131 2018/10/11 2019/01/01
123 9 550 2019/01/02 2099/01/01
select *,
dense_rank()over(partition by id order by id,product)
from table
Expected results:
id version product startdate enddate count
123 0 2443 2010/09/01 2011/01/02 1
123 1 131 2011/01/03 2011/03/09 2
123 2 131 2011/08/10 2012/09/10 2
123 3 3009 2012/09/11 2014/03/31 3
123 4 668 2014/04/01 2014/04/30 4
123 5 668 2014/05/01 2016/01/01 4
123 6 668 2016/01/02 2017/09/08 4
123 7 131 2017/09/09 2017/10/10 5
123 8 131 2018/10/11 2019/01/01 5
123 9 550 2019/01/02 2099/01/01 6
Try the following
SELECT
id,version,product,startdate,enddate,
1+SUM(v)OVER(PARTITION BY id ORDER BY version) n
FROM
(
SELECT
*,
IIF(LAG(product)OVER(PARTITION BY id ORDER BY version)<>product,1,0) v
FROM TestTable
) q

Tableau Pivot Rows into Columns

I have a table structure like this:
Department Employee Class Peroid Qty1 Qty2 Qty3
----------------------------------------------------
Dept1 John 1 1st 1 2 3
Dept1 John 1 2nd 11 22 33
Dept1 Mary 1 1st 2 3 4
Dept1 Mary 1 2nd 22 33 44
Dept2 Joe 1 1st 3 4 5
Dept2 Joe 1 2nd 33 44 55
Dept2 Paul 1 1st 4 5 6
Dept2 Paul 1 2nd 44 55 66
In a view I'd like to display the format as such:
Class / Period
1
Department Employee 1st 2nd
----------------------------------------------
Dept1 John 1 2 3 11 22 33
Dept1 Mary 2 3 4 22 33 44
Dept2 Joe 3 4 5 33 44 55
Dept2 Paul 4 5 6 44 55 66
I can't seem to find a way to do this. I have Class, Period as Columns and Department, Employee as Rows then drag Qty1, Qty2, Qty3 to the Text Mark but the format becomes:
Class / Period
1
Department Employee 1st 2nd
----------------------------------------------
Dept1 John 1 11
2 22
3 33
Dept1 Mary 2 22
3 33
4 44
Dept2 Joe 3 33
4 44
5 55
Dept2 Paul 4 44
5 55
6 66
How do I turn those rows under each employee to sub-columns under Period?
I think this is what you're trying to achieve.
A lot of times when you see a repeating column in a database table, Qty1, Qty2, Qty3, it is a sign that you really want multiple rows each with a single Qty (and repeating the other information) -- At least when you are building reports. That way you can have rows with any number of instances of Qty, and you can also easily aggregate all the Qty together when needed.
There are situations where you may want to stick with a repeating field design. But if you do want to reshape the data, you can do that in Tableau's data connection window by selecting the columns you want to pull out into a single field and selecting the pivot command.