kdb: dynamically denormalize a table (convert key values to column names) - kdb

I have a table like this:
q)t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);server:(`S01`S02`S01`S02)];volume:(20;10;30;50))
q)t
sym server| volume
-------------| ------
EURUSD S01 | 20
EURUSD S02 | 10
AUDUSD S01 | 30
AUDUSD S02 | 50
I need to de-normalize it to display the data nicely. The resulting table should look like this:
sym | S01 S02
------| -------
EURUSD| 20 10
AUDUSD| 30 50
How do I dynamically convert the original table using distinct values from server column as column names for the new table?
Thanks!

Basically you want 'pivot' table. Following page has a very good solution for your problem:
http://code.kx.com/q/cookbook/pivoting-tables/
Here are the commands to get the required table:
q) P:asc exec distinct server from t
q) exec P#(server!volume) by sym:sym from t

One tricky thing around pivoting a table is - the keys of the dictionary should be of type symbol otherwise it won't generate the pivot table structure.
E.g. In the following table, we have a column dt with type as date.
t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);dt:(0 1 0 1+.z.d)];volume:(20;10;30;50))
Now if we want to pivot it with columns as dates , it will generate a structure like :
q)P:asc exec distinct dt from t
q)exec P#(dt!volume) by sym:sym from t
(`s#flip (enlist `sym)!enlist `s#`AUDUSD`EURUSD)!((`s#2018.06.22 2018.06.23)!30j, 50j;(`s#2018.06.22 2018.06.23)!20j, 10j)
To get the dates as the columns , the dt column has to be typecasted to symbol :
show P:asc exec distinct `$string date from t
`s#`2018.06.22`2018.06.23
q)exec P#((`$string date)!volume) by sym:sym from t
sym | 2018.06.22 2018.06.23
------| ---------------------
AUDUSD| 30 50
EURUSD| 20 10

Related

Pivot table with multiple value columns in KDB+

I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?
(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30
The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50
I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.

Adding days to sysdate based off of values in a column

I am trying to create a manual table based off of a currently built views table.
The structure of this current table is as follows:
ID | Column1 | Column2 | Buffer Days
1 | Asdf | Asdf1 | 91
2 | Qwert | Qwert1 | 11
3 | Zxcv | Zxcv1 | 28
The goal is to add a 4th column after Buffer Days that lists the sys date + the number in buffer days
So the outcome would look like:
ID | Column1 | Column2 | Buffer Days | Lookout Date
1 | Asdf | Asdf1 | 91 | 02-Jan-18
That requirement smells like a virtual column candidate. However, it won't work:
SQL> create table test
2 (id number,
3 column1 varchar2(10),
4 buffer_days number,
5 --
6 lookout_date as (SYSDATE + buffer_days) --> virtual column
7 );
lookout_date as (SYSDATE + buffer_days)
*
ERROR at line 6:
ORA-54002: only pure functions can be specified in a virtual column expression
Obviously, as SYSDATE is a non-deterministic function (doesn't return the same value when invoked).
Why not an "ordinary" column in existing table? Because you shouldn't store values that are calculated using other table columns anyway. For example, good old Scott's EMP table contains SAL and COMM columns. It doesn't (and shouldn't) contain TOTAL_SAL column (as SAL + COMM) because - when SAL and/or COMM changes, you have to remember to update TOTAL as well.
Therefore, a view is what could help here. For example:
SQL> create table test
2 (id number,
3 column1 varchar2(10),
4 buffer_days number
5 );
Table created.
SQL> create or replace view v_test as
2 select id,
3 column1,
4 buffer_days,
5 sysdate + buffer_days lookout_date
6 from test;
View created.
SQL> insert into test (id, column1, buffer_days) values (1, 'asdf', 5);
1 row created.
SQL> select sysdate, v.* from v_test v;
SYSDATE ID COLUMN1 BUFFER_DAYS LOOKOUT_DA
---------- ---------- ---------- ----------- ----------
23.12.2017 1 asdf 5 28.12.2017
SQL>

LIKE operator in Postgresql

Is it possible using LIKE operator to write a query to find values that residing in a numeric datatype column?
For example,
Table sample
ID | VALUE(numeric)
1 | 1.00
2 | 2.00
select * from sample where VALUE LIKE '1%'
Please clear my doubt...
If I understood you correctly then following could be a solution for you
consider this sample
create table num12 (id int,VALUE numeric);
insert into num12 values (1,1.00),(2,2.00);
insert into num12 values (3,1.50),(4,1.90);
the table look like
id value
-- -----
1 1.00
2 2.00
3 1.50
4 1.90
select * from num12 where value =1
will return only single row,
id value
-- -----
1 1.00
If you want to select all 1s then use(I guess you're trying to find a solution for this)
select * from num12 where trunc(value) =1
result:
id value
-- -----
1 1.00
3 1.50
4 1.90
Is it possible using LIKE operator to write a query to find values
that residing in a numeric datatype column?
Answer: Yes
You can use select * from num12 where value::text like '1%'
Note : It yields same result as shown above but its not a good method

Redshift Postgres SQL comparing NULL vs NOT NULL values in a table

I am trying to create a query in Redshift DB (Postgres SQL) to do the following:
I have columns that I am checking for quality control and need the percentages of NULL vs. NOT NULL for each column. I would like my output to look like this, below shows the totals but need it in % if possible. How can I write this query?
Column NOT NULL NULL Total Records Percentage NULL
--------- ------- ------ ---------------- ---------------------
Column A 78 10 88 11.3%
Column B 68 15 83 18.0%
Column C 3 5 8 62.5%
With SQL, you can calculate the values for a specific column, like this:
select
count(a) as "NOT_NULL",
count(*) - count(a) as "NULL",
count(*) as "Total Records",
to_char(100.0 * (count(*) - count(a)) / count(*), '999.9%') as "Percentage NULL"
from stack
However, it is not possible to display "one row per column". You would have to JOIN several queries together to produce that result.

need help writing a date sensitive T-SQL query

I need help writing a T-SQL query that will generate 52 rows of data per franchise from a table that will often contain gaps in the 52 week sequence per franchise (i.e., the franchise may have reported data bi-weekly or has not been in business for a full year).
The table I'm querying against looks something like this:
FranchiseId | Date | ContractHours | PrivateHours
and I need to join it to a table similar to this:
FranchiseId | Name
The output of the query needs to look like this:
Name | Date | ContractHours | PrivateHours
---- ---------- ------------- ------------
AZ1 08-02-2011 292 897
AZ1 07-26-2011 0 0 -- default to 0's for gaps in sequence
...
AZ1 08-03-2010 45 125 -- row 52 for AZ1
AZ2 08-02-2011 382 239
...
AZ2 07-26-2011 0 0 -- row 52 for AZ2
I need this style of output for every franchise, i.e., 52 rows of data with default rows for any gaps in the 52 week sequence, in a single result set. Thus, if there are 100 franchises, the result set should be 5200 rows.
What I've Tried
I've tried the typical suggestions of:
Create a table with all possible dates
LEFT OUTER JOIN this to the table of data needed
The problems I'm running into are
ensuring that for every franchise their are 52 rows and
filling in gaps with the franchise name and 0 for hours, I can't
have the following in the result set:
Name | Date | ContractHours | PrivateHours
---- ---------- ------------- ------------
NULL 08-02-2011 NULL NULL
I don't know where to go from here? Is there an efficient way to write a T-SQL query that will produce the required output?
The bare bones is this
Generate 52 week ranges
Cross join with Franchise
LEFT JOIN the actual date
ISNULL to substitute zeroes
So, like this, untested
;WITH cDATE AS
(
SELECT
CAST('20100101' AS date /*smalldatetime*/) AS StartOfWeek,
CAST('20100101' AS date /*smalldatetime*/) + 6 AS EndOfWeek
UNION ALL
SELECT StartOfWeek + 7, EndOfWeek + 7
FROM cDATE WHERE StartOfWeek + 7 < '20110101'
), Possibles AS
(
SELECT
StartOfWeek, FranchiseID
FROM
cDATE CROSS JOIN Franchise
)
SELECT
P.FranchiseID,
P.StartOfWeek,
ISNULL(SUM(O.ContractHours), 0),
ISNULL(SUM(O.PrivateHours), 0)
FROM
Possibles P
LEFT JOIN
TheOtherTable O ON P.FranchiseID = O.FranchiseID AND
O.Date BETWEEN P.StartOfWeek AND P.EndOfWeek
GROUP BY
P.FranchiseID