how to rank the column values of each group - kdb

I have a table
Want to set the ranks based on max volume for each group .i.e date.
If volume is null, then dont rank it. Keep rank column empty for null volume. (example see line 11 and 12 in expected output snapshot)
The rank=1 is our front contract, if sym flipped then it cannot be rank1 again after flip. example see output snapshot line 9, 13 and 15
expected output is
To generate the sample table, use below code.
tab:([]date:`date$();sym:`symbol$();name:`symbol$();volume:`float$();roll_rank:`int$());
`tab insert (2010.01.01;`ESH22;`ES;100.1;0Ni);
`tab insert (2010.01.01;`ESH23;`ES;500.1;0Ni);
`tab insert (2010.01.02;`ESH22;`ES;100.1;0Ni);
`tab insert (2010.01.02;`ESH23;`ES;800.1;0Ni);
`tab insert (2010.01.02;`ESH24;`ES;600.1;0Ni);
`tab insert (2010.01.02;`ESH25;`ES;550.1;0Ni);
`tab insert (2010.01.02;`ESH26;`ES;200.1;0Ni);
`tab insert (2010.01.03;`ESH23;`ES;600.1;0Ni);
`tab insert (2010.01.03;`ESH24;`ES;700.1;0Ni);
`tab insert (2010.01.03;`ESH26;`ES;0n;0Ni);
`tab insert (2010.01.03;`ESH25;`ES;500.1;0Ni);
`tab insert (2010.01.03;`ESH26;`ES;0n;0Ni);
`tab insert (2010.01.04;`ESH23;`ES;50.1;0Ni);
`tab insert (2010.01.05;`ESH23;`ES;300.1;0Ni);
`tab insert (2010.01.05;`ESH24;`ES;800.1;0Ni);
`tab insert (2010.01.05;`ESH25;`ES;100.1;0Ni);

The following will put the table in descending order by date, with the rank number in a separate column:
q)ungroup select volume:desc volume,ranknumber:1+til count volume by date from tab
Code ouput with the provided table data:
date volume ranknumber
----------------------------
2010.01.01 500.1 1
2010.01.01 100.1 2
2010.01.02 800.1 1
2010.01.02 600.1 2
2010.01.02 550.1 3
2010.01.02 200.1 4
2010.01.02 100.1 5
2010.01.03 700.1 1
2010.01.03 600.1 2
2010.01.03 500.1 3
2010.01.03 4
2010.01.03 5
2010.01.04 50.1 1
2010.01.05 800.1 1
2010.01.05 300.1 2
2010.01.05 100.1 3
Haven't thought of an elegant way of not including the null values in the rank order yet.
Edit: You could use "update" on the sorted table to remove the ranked null values - something like this would work (where tab2 is the previous output):
q)update ranknumber:0N from tab2 where ranked=0N
date ranked ranknumber
----------------------------
2010.01.01 500.1 1
2010.01.01 100.1 2
2010.01.02 800.1 1
2010.01.02 600.1 2
2010.01.02 550.1 3
2010.01.02 200.1 4
2010.01.02 100.1 5
2010.01.03 700.1 1
2010.01.03 600.1 2
2010.01.03 500.1 3
2010.01.03
2010.01.03
2010.01.04 50.1 1
2010.01.05 800.1 1
2010.01.05 300.1 2
2010.01.05 100.1 3

Related

calculate roll_rank using prev value for each row

I have a table where we need to set and sum roll_rank except from roll_rank 0,1,2
we dont need to touch the rows where roll_rank in 0,1,2
we want to calculate sums of roll_rank by date where not roll_rank in 0,1,2.
example table:
tmp:([]date:`date$();name:`symbol$();roll_rank:`int$())
`tmp insert (2010.01.01;`sym1;1);
`tmp insert (2010.01.01;`sym2;2);
`tmp insert (2010.01.01;`sym3;0Ni);
`tmp insert (2010.01.01;`sym4;0Ni);
`tmp insert (2010.01.02;`sym1;0);
`tmp insert (2010.01.02;`sym2;1);
`tmp insert (2010.01.02;`sym3;2);
`tmp insert (2010.01.02;`sym4;0Ni);
`tmp insert (2010.01.02;`sym5;0Ni);
`tmp insert (2010.01.02;`sym6;0Ni);
`tmp insert (2010.01.03;`sym1;1);
`tmp insert (2010.01.03;`sym2;0Ni);
`tmp insert (2010.01.03;`sym3;0Ni);
`tmp insert (2010.01.03;`sym4;0Ni);
Expected output is
This might also achieve your desired result:
update sums 1^deltas roll_rank by date from tmp
One method using a vector conditional and over:
q){update{?[null x;1+prev x;x]}roll_rank from x}/[tmp]
date name roll_rank
-------------------------
2010.01.01 sym1 1
2010.01.01 sym2 2
2010.01.01 sym3 3
2010.01.01 sym4 4
2010.01.02 sym1 0
2010.01.02 sym2 1
2010.01.02 sym3 2
2010.01.02 sym4 3
2010.01.02 sym5 4
2010.01.02 sym6 5
2010.01.03 sym1 1
2010.01.03 sym2 2
2010.01.03 sym3 3
2010.01.03 sym4 4

TSQL, Pivot rows into single columns

Before, I had to solve something similar:
Here was my pivot and flatten for another solution:
I want to do the same thing on the example below but it is slightly different because there are no ranks.
In my previous example, the table looked like this:
LocationID Code Rank
1 123 1
1 124 2
1 138 3
2 999 1
2 888 2
2 938 3
And I was able to use this function to properly get my rows in a single column.
-- Check if tables exist, delete if they do so that you can start fresh.
IF OBJECT_ID('tempdb.dbo.#tbl_Location_Taxonomy_Pivot_Table', 'U') IS NOT NULL
DROP TABLE #tbl_Location_Taxonomy_Pivot_Table;
IF OBJECT_ID('tbl_Location_Taxonomy_NPPES_Flattened', 'U') IS NOT NULL
DROP TABLE tbl_Location_Taxonomy_NPPES_Flattened;
-- Pivot the original table so that you have
SELECT *
INTO #tbl_Location_Taxonomy_Pivot_Table
FROM [MOAD].[dbo].[tbl_Location_Taxonomy_NPPES] tax
PIVOT (MAX(tax.tbl_lkp_Taxonomy_Seq)
FOR tax.Taxonomy_Rank in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15])) AS pvt
-- ORDER BY Location_ID
-- Flatten the tables.
SELECT Location_ID
,max(piv.[1]) as Tax_Seq_1
,max(piv.[2]) as Tax_Seq_2
,max(piv.[3]) as Tax_Seq_3
,max(piv.[4]) as Tax_Seq_4
,max(piv.[5]) as Tax_Seq_5
,max(piv.[6]) as Tax_Seq_6
,max(piv.[7]) as Tax_Seq_7
,max(piv.[8]) as Tax_Seq_8
,max(piv.[9]) as Tax_Seq_9
,max(piv.[10]) as Tax_Seq_10
,max(piv.[11]) as Tax_Seq_11
,max(piv.[12]) as Tax_Seq_12
,max(piv.[13]) as Tax_Seq_13
,max(piv.[14]) as Tax_Seq_14
,max(piv.[15]) as Tax_Seq_15
-- JOIN HERE
INTO tbl_Location_Taxonomy_NPPES_Flattened
FROM #tbl_Location_Taxonomy_Pivot_Table piv
GROUP BY Location_ID
So, then here is the data I would like to work with in this example.
LocationID Foreign Key
2 2
2 670
2 2902
2 5389
3 3
3 722
3 2905
3 5561
So I have some data that is formatted like this:
I have used pivot on data like this before--But the difference was it had a rank also. Is there a way to get my foreign keys to show up in this format using a pivot?
locationID FK1 FK2 FK3 FK4
2 2 670 2902 5389
3 3 722 2905 5561
Another way I'm looking to solve this is like this:
Another way I could look at doing this is I have the values in:
this form as well:
LocationID Address_Seq
2 670, 5389, 2902, 2,
3 722, 5561, 2905, 3
etc
is there anyway I can get this to be the same?
ID Col1 Col2 Col3 Col4
2 670 5389, 2902, 2
This, adding a rank column and reversing the orders, should gives you what you require:
SELECT locationid, [4] col1, [3] col2, [2] col3, [1] col4
FROM
(
SELECT locationid, foreignkey,rank from #Pivot_Table ----- temp table with a rank column
) x
PIVOT (MAX(x.foreignkey)
FOR x.rank in ([4],[3],[2],[1]) ) pvt

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!
use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

Difference of dates using lag function postgres

I have customer ID and transaction Date(yyyy-mm-dd) as shown below
Cust_id Trans_date
1 2017-01-01
1 2017-01-03
1 2017-01-06
2 2017-01-01
2 2017-01-04
2 2017-01-05
I need to find the difference in no_of_days for each transaction grouped at Cust_id
I tried with date_diff and extract using lag function, but I am getting error
function lag(timestamp without time zone) may only be called as a window function
I looking for the result as below
Cust_id Trans_date difference
1 2017-01-01 0
1 2017-01-03 3
1 2017-01-05 2
2 2017-01-01 0
2 2017-01-04 4
2 2017-01-05 1
How to find the difference in postgreSQL?
This is what you want?
with t(Cust_id,Trans_date) as(
select 1 ,'2017-01-01'::timestamp union all
select 1 ,'2017-01-03'::timestamp union all
select 1 ,'2017-01-06'::timestamp union all
select 2 ,'2017-01-01'::timestamp union all
select 2 ,'2017-01-04'::timestamp union all
select 2 ,'2017-01-05'::timestamp
)
select
Cust_id,
Trans_date,
coalesce(Trans_date::date - lag(Trans_date::date) over(partition by Cust_id order by Trans_date), 0) as difference
from t;

How to generate a date to be included in UNPIVOT results without a loop?

Say I had an example like so, where Im transposing columns into rows with UNPIVOT.
DECLARE #pvt AS TABLE (VendorID int, Emp1 int, Emp2 int, Emp3 int, Emp4 int, Emp5 int);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (1,4,3,5,4,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (2,4,1,5,5,5);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (3,4,3,5,4,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (4,4,2,5,5,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (5,5,1,5,5,5);
--Unpivot the table.
SELECT VendorID, Employee, Orders
FROM
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM #pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO
Which produces results like this
VendorID Employee Orders
1 Emp1 4
1 Emp2 3
1 Emp3 5
1 Emp4 4
1 Emp5 4
2 Emp1 4
2 Emp2 1
2 Emp3 5
2 Emp4 5
2 Emp5 5
3 Emp1 4
3 Emp2 3
3 Emp3 5
3 Emp4 4
3 Emp5 4
However, I want to include an "incremental date like so that it repeats in a group for each Vendor and the results would be like this
VendorID Employee Orders OrderDate
1 Emp1 4 01/01/2014
1 Emp2 3 02/01/2014
1 Emp3 5 03/01/2014
1 Emp4 4 04/01/2014
1 Emp5 4 05/01/2014
2 Emp1 4 ..
2 Emp2 1
2 Emp3 5
2 Emp4 5
2 Emp5 5
3 Emp1 4
3 Emp2 3
3 Emp3 5
3 Emp4 4
3 Emp5 4
The kicker is that I want to try to do this without resorting to a loop since the transposed results are going to be about 100K records. Is there a way to generate that date field like that without looping over the results?
[edit]
I think, but not sure yet, that [this]1 post might help, using ROW NUMBER
You can use:
Dateadd(DAY, row_number() over( partition by VendorId Order by Employee), #stardate)
According to your example you can partition by vendorId and order by Employee. But you can change just like a regular order by.