Pivot table with multiple value columns in KDB+ - kdb

I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?

(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30

The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50

I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.

Related

T-SQL vlookup with fake calendar table?

I am rather new in T-SQL and I have to create a view, where the output will be as shown below:
enter image description here
But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?
This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.
The Auxiliary Table Numbers
First lets learn about what a tally table, aka numbers table is all about.
What does this do?
SELECT N = 1 ;
It returns the number 1.
N
-----
1
How about this?
SELECT N = 1 FROM (VALUES(0)) AS e(N);
Same thing:
N
-----
1
What does this return?
SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);
Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:
N
-------
1
1
1
1
1
We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;
It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;
Next we leverage ROW_NUMBER() over our set of dummy values.
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;
Returns (truncated for brevity):
N
--------------------
1
2
3
...
24
25
Using as an auxiliary numbers table
#OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:
;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE #OneToTen TABLE (N INT);
INSERT #OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT
N = i.N,
Wrong = COUNT(*), -- WRONG!!! Don't do THIS, this counts ALL rows returned
Correct = COUNT(t.N) -- Correct, this counts numbers from #OneToTen AKA "t.N"
FROM iTally AS i -- Aux Table of numbers
LEFT JOIN #OneToTen AS t -- Table to evaluate
ON i.N = t.N -- LEFT JOIN #OneToTen numbers to our Aux table of numbers
WHERE i.N <= 10 -- We only need the numbers 1 to 10
GROUP BY i.N; -- Group by with no Sort!!!
This returns:
N Wrong Correct
----- ----------- -----------
1 1 1
2 3 3
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 2 2
9 1 0
10 3 3
Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).
Auxiliary table of Dates (AKA calendar table)
My we use our numbers table to create a calendar table.
;--== 3. Auxilliary Month/Year Calendar Table
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1)
TheDate = f.Dt,
TheYear = YEAR(f.Dt),
TheMonth = MONTH(f.Dt),
TheWeekday = DATEPART(WEEKDAY,f.Dt),
DayOfTheYear = DATEPART(DAYOFYEAR,f.Dt),
LastDayOfMonth = EOMONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
This returns:
TheDate TheYear TheMonth TheWeekday DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019 10 3 274 2019-10-31
2019-11-01 2019 11 6 305 2019-11-30
2019-12-01 2019 12 1 335 2019-12-31
2020-01-01 2020 1 4 1 2020-01-31
2020-02-01 2020 2 7 32 2020-02-29
2020-03-01 2020 3 1 61 2020-03-31
You will only need the YEAR and MONTH.
The Auxiliary Customer table
Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.
SAMPLE DATA:
;--== Sample Data
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
Distinct list of customers for an "Auxilliary Table of Customers"
SELECT DISTINCT s.Customer FROM #sale AS s;
For my sample data we get:
Customer
----------
ABC
CDF
FRED
Putting it all together
Here I'm going to:
Create a numbers table
Use my numbers table to create a calendar table
Create an auxiliary Customer table from #sale
CROSS JOIN (combine) both tables for a "junk dimension"
LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
Group by the auxiliary table values
SOLUTION:
;--==== SAMPLE DATA
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
;--==== START/END DATEs
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
;--==== FINAL SOLUTION
WITH -- 6.1. Auxilliary Table of numbers:
E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
-- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1) YEAR(f.Dt), MONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
)
SELECT
Customer = cust.Customer,
MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
Sales = ISNULL(SUM(s.SaleAmt),0)
-- Auxilliary Table of Customers
FROM (SELECT DISTINCT s.Customer FROM #sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN #sale AS s -- 6.5. Join #sale to Junk Dimension on Year,Month and Customer
ON s.SaleYear = cal.SaleYear
AND s.SaleMonth = cal.SaleMonth
AND s.Customer = cust.Customer
GROUP BY cust.Customer, cal.SaleYear, cal.SaleMonth -- 6.6. Group by Junk Dim values
ORDER BY cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required
RESULTS:
Customer MonthYear Sales
---------- ------------ ------------
ABC 2019-10 0.00
ABC 2019-11 0.00
ABC 2019-12 410.00
ABC 2020-1 718.00
ABC 2020-2 0.00
ABC 2020-3 250.00
CDF 2019-10 200.00
CDF 2019-11 198.00
CDF 2019-12 0.00
CDF 2020-1 333.00
CDF 2020-2 5325.00
CDF 2020-3 1105.00
FRED 2019-10 0.00
FRED 2019-11 0.00
FRED 2019-12 0.00
FRED 2020-1 0.00
FRED 2020-2 0.00
FRED 2020-3 0.00

Pass a table as condition with where clause in kdb

I have a table t:
t:([] sym:`GOOG`AMZN; px:10 20; vol:100 200);
Is it possible to pass a sub-table as a where clause condition to the table?
Below query throws type error:
select from t where ([] sym:enlist `GOOG; px:enlist 10)
Yes, it is possible:
q)select from t where([]sym;px) in ([] sym:enlist `GOOG; px:enlist 10)
sym px vol
-----------
GOOG 10 100
Update: however, if t is large this should be much faster:
q)([] sym:enlist `GOOG; px:enlist 10)#2!t
sym px| vol
-------| ---
GOOG 10| 100

kdb: dynamically denormalize a table (convert key values to column names)

I have a table like this:
q)t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);server:(`S01`S02`S01`S02)];volume:(20;10;30;50))
q)t
sym server| volume
-------------| ------
EURUSD S01 | 20
EURUSD S02 | 10
AUDUSD S01 | 30
AUDUSD S02 | 50
I need to de-normalize it to display the data nicely. The resulting table should look like this:
sym | S01 S02
------| -------
EURUSD| 20 10
AUDUSD| 30 50
How do I dynamically convert the original table using distinct values from server column as column names for the new table?
Thanks!
Basically you want 'pivot' table. Following page has a very good solution for your problem:
http://code.kx.com/q/cookbook/pivoting-tables/
Here are the commands to get the required table:
q) P:asc exec distinct server from t
q) exec P#(server!volume) by sym:sym from t
One tricky thing around pivoting a table is - the keys of the dictionary should be of type symbol otherwise it won't generate the pivot table structure.
E.g. In the following table, we have a column dt with type as date.
t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);dt:(0 1 0 1+.z.d)];volume:(20;10;30;50))
Now if we want to pivot it with columns as dates , it will generate a structure like :
q)P:asc exec distinct dt from t
q)exec P#(dt!volume) by sym:sym from t
(`s#flip (enlist `sym)!enlist `s#`AUDUSD`EURUSD)!((`s#2018.06.22 2018.06.23)!30j, 50j;(`s#2018.06.22 2018.06.23)!20j, 10j)
To get the dates as the columns , the dt column has to be typecasted to symbol :
show P:asc exec distinct `$string date from t
`s#`2018.06.22`2018.06.23
q)exec P#((`$string date)!volume) by sym:sym from t
sym | 2018.06.22 2018.06.23
------| ---------------------
AUDUSD| 30 50
EURUSD| 20 10

SELECT record based upon dates

Assuming data such as the following:
ID EffDate Rate
1 12/12/2011 100
1 01/01/2012 110
1 02/01/2012 120
2 01/01/2012 40
2 02/01/2012 50
3 01/01/2012 25
3 03/01/2012 30
3 05/01/2012 35
How would I find the rate for ID 2 as of 1/15/2012?
Or, the rate for ID 1 for 1/15/2012?
In other words, how do I do a query that finds the correct rate when the date falls between the EffDate for two records? (Rate should be for the date prior to the selected date).
Thanks,
John
How about this:
SELECT Rate
FROM Table1
WHERE ID = 1 AND EffDate = (
SELECT MAX(EffDate)
FROM Table1
WHERE ID = 1 AND EffDate <= '2012-15-01');
Here's an SQL Fiddle to play with. I assume here that 'ID/EffDate' pair is unique for all table (at least the opposite doesn't make sense).
SELECT TOP 1 Rate FROM the_table
WHERE ID=whatever AND EffDate <='whatever'
ORDER BY EffDate DESC
if I read you right.
(edited to suit my idea of ms-sql which I have no idea about).

need help writing a date sensitive T-SQL query

I need help writing a T-SQL query that will generate 52 rows of data per franchise from a table that will often contain gaps in the 52 week sequence per franchise (i.e., the franchise may have reported data bi-weekly or has not been in business for a full year).
The table I'm querying against looks something like this:
FranchiseId | Date | ContractHours | PrivateHours
and I need to join it to a table similar to this:
FranchiseId | Name
The output of the query needs to look like this:
Name | Date | ContractHours | PrivateHours
---- ---------- ------------- ------------
AZ1 08-02-2011 292 897
AZ1 07-26-2011 0 0 -- default to 0's for gaps in sequence
...
AZ1 08-03-2010 45 125 -- row 52 for AZ1
AZ2 08-02-2011 382 239
...
AZ2 07-26-2011 0 0 -- row 52 for AZ2
I need this style of output for every franchise, i.e., 52 rows of data with default rows for any gaps in the 52 week sequence, in a single result set. Thus, if there are 100 franchises, the result set should be 5200 rows.
What I've Tried
I've tried the typical suggestions of:
Create a table with all possible dates
LEFT OUTER JOIN this to the table of data needed
The problems I'm running into are
ensuring that for every franchise their are 52 rows and
filling in gaps with the franchise name and 0 for hours, I can't
have the following in the result set:
Name | Date | ContractHours | PrivateHours
---- ---------- ------------- ------------
NULL 08-02-2011 NULL NULL
I don't know where to go from here? Is there an efficient way to write a T-SQL query that will produce the required output?
The bare bones is this
Generate 52 week ranges
Cross join with Franchise
LEFT JOIN the actual date
ISNULL to substitute zeroes
So, like this, untested
;WITH cDATE AS
(
SELECT
CAST('20100101' AS date /*smalldatetime*/) AS StartOfWeek,
CAST('20100101' AS date /*smalldatetime*/) + 6 AS EndOfWeek
UNION ALL
SELECT StartOfWeek + 7, EndOfWeek + 7
FROM cDATE WHERE StartOfWeek + 7 < '20110101'
), Possibles AS
(
SELECT
StartOfWeek, FranchiseID
FROM
cDATE CROSS JOIN Franchise
)
SELECT
P.FranchiseID,
P.StartOfWeek,
ISNULL(SUM(O.ContractHours), 0),
ISNULL(SUM(O.PrivateHours), 0)
FROM
Possibles P
LEFT JOIN
TheOtherTable O ON P.FranchiseID = O.FranchiseID AND
O.Date BETWEEN P.StartOfWeek AND P.EndOfWeek
GROUP BY
P.FranchiseID