Difference between rows in KDB/Q - kdb

I'm new to KDB/Q and have a question around getting the difference between two (not necessarily adjacent) rows.
I have only one table, which looks like the below:
q)tickers:`ibm`bac`dis`gs`ibm`gs`dis`bac
q)pxs:100 50 30 250 110 240 45 48
q)dates:2013.05.01 2013.01.05 2013.02.03 2013.02.11 2013.06.17 2013.06.21 2013.04.24 2013.01.06
q)trades:([tickers;dates];pxs)
q)trades
tickers dates | pxs
------------------| ---
ibm 2013.05.01| 100
bac 2013.01.05| 50
dis 2013.02.03| 30
gs 2013.02.11| 250
ibm 2013.06.17| 110
gs 2013.06.21| 240
dis 2013.04.24| 45
bac 2013.01.06| 48
I would like to be able to have a either another column in the table that stores the difference between the current and the previous price, or another structure similar in structure. The key question that the resulting needs to answer is "by how much did the stock change compared to the previous time a price was recorded?"
So far I've tried something along the lines of:
select tickers, dates, pxs - pxs(dates bin (exec dates from trades where tickers = trades.tickers)) from trades
which doesn't really work (at all). Definitely due to trying to do SQL-like queries and having a row-oriented mindset.
Please find below an exemple of the sought after answer:
q)trades: do magic with trades
q)trades
tickers dates | pxs | delta
------------------| --- | -----
ibm 2013.05.01| 100 | 0
bac 2013.01.05| 50 | 0
dis 2013.02.03| 30 | 0
gs 2013.02.11| 250 | 0
ibm 2013.06.17| 110 | 10
gs 2013.06.21| 240 | -10
dis 2013.04.24| 45 | 15
bac 2013.01.06| 48 | -2
Thanks for your help,
Dan

q)update delta:{0,1_deltas x}pxs by tickers from trades
tickers dates | pxs delta
------------------| ---------
ibm 2013.05.01| 100 0
bac 2013.01.05| 50 0
dis 2013.02.03| 30 0
gs 2013.02.11| 250 0
ibm 2013.06.17| 110 10
gs 2013.06.21| 240 -10
dis 2013.04.24| 45 15
bac 2013.01.06| 48 -2

if you do:
select pxs by dates,tickers from table
you will have a complex column (pxs) which is a list of prices for the particular date and ticker. You can then apply deltas:
select deltas pxs by dates,tickers from table
Which will give you the running difference. The first value is the original pxs though so you'll need to update the first one to 0.
EDIT
Just re-read and having looked at your result, you'll need to join back to your original trade table

update dates, pxs, delta:(0N,(-1_ pxs) - 1_ pxs) by tickers from trades
Please find how it works:
select pxs by tickets from trades
creates table which rows contains: ticket and list pxs.
So in every row we have a list:
tickers| pxs
-------| -------
bac | 50 48
dis | 30 45
gs | 250 240
ibm | 100 110
now we have to apply function which will calculate delta. Best function mentioned above: deltas, but my version is about the same.
if we select - then we will have table with tickers|list of pxs|list of deltas, but is we use update .. by, then it ungroup groupped values.

You can get the same results using the prev function. One thing worth highlighting that prev automatically adds the null (0N) as the first element. This is important as we don't have the previous information available, however, adding a 0 as the first element suggests that there has not been any change; though it depends on how you want to handle the first record.
q)update delta:pxs-prev[pxs] by tickers from trades
tickers dates | pxs delta
------------------| ---------
ibm 2013.05.01| 100
bac 2013.01.05| 50
dis 2013.02.03| 30
gs 2013.02.11| 250
ibm 2013.06.17| 110 10
gs 2013.06.21| 240 -10
dis 2013.04.24| 45 15
bac 2013.01.06| 48 -2
using deltas to get the same results (0N instead of 0)
q)update delta:{0N,1_deltas x}pxs by tickers from trades

Related

query for selecting N records

I have a table tab that has cols date,sym,value and is sorted from oldest date to the recent.
I am trying to select the past N records for each sym and am not sure of the query for this. I know that I can select based on date being within a range but I needed it based on sym irrespective if value appeared on consecutive dates or not.
You could do this with fby and the virtual row number column i:
https://code.kx.com/q/ref/fby/
q){ select from tab where ({y in x#y}[x];i) fby sym }[-2]
date sym time src price size
------------------------------------------------------------
2014.04.21 AAPL 2014.04.21D16:29:03.253000000 N 24.98 3561
2014.04.21 AAPL 2014.04.21D16:29:03.558000000 N 24.98 2733
2014.04.21 CSCO 2014.04.21D16:28:56.265000000 O 35.6 8390
2014.04.21 CSCO 2014.04.21D16:29:44.572000000 L 35.61 2286
2014.04.21 DELL 2014.04.21D16:29:35.374000000 L 29.57 1444
2014.04.21 DELL 2014.04.21D16:29:39.979000000 N 29.56 216
2014.04.21 GOOG 2014.04.21D16:29:50.569000000 N 41.87 722
2014.04.21 GOOG 2014.04.21D16:29:58.633000000 O 41.9 437
Edit: Faster way would be to use functional exec with the 5th argument n(number of records) for each sym.
raze{
//[table;where;by;cols;rows]
?[tab;enlist (in;`sym;enlist x);0b;();y]
}[;-2]'[distinct tab[`sym]]
https://code.kx.com/q/basics/funsql/
Matt's suggestions using an fby and functional select are best if you want all columns in the table returned. If you only need the date, sym & price columns returned you could use
q)ungroup select -2#date,-2#price by sym from trade
sym date price
----------------------
APPL 2021.03.13 111.77
APPL 2021.03.13 111.85
CAT 2021.03.13 246
CAT 2021.03.13 246.27
GOOG 2021.03.13 206.24
GOOG 2021.03.13 206.21
NYSE 2021.03.13 60.67
NYSE 2021.03.13 60.97
Note that this can become tedious when selecting a large number of columns. In those cases it's better to stick with Matt's suggestions.

Pivot table with multiple value columns in KDB+

I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?
(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30
The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50
I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.

Cannot access the value of the kdb table column

I am new to KDB and cannot understand why i can access the order column for the table stocks but not trader. The below is my code with the error.
q)trader
item brand | price order
---------------| -----------
soda fry | 1.5 200
bacon prok | 1.99 180
mushroom veggie| 0.88 110
eggs veggie| 1.55 210
tomatoes veggie| 1.35 100
q)trader.order
'order
[0] trader.order
^
q)stock.order
50 82 45 92
q)stock
item brand price order
---------------------------
soda fry 1.5 50
bacon prok 1.99 82
mushroom veggie 0.88 45
eggs veggie 1.55 92
q)trader.order
'order
[0] trader.order
^
Your table trader is keyed and you cannot use trader.order to select the order column.
You can use this instead if you want
(0!trader)`order
The reason is because when you do trader.order what you actually do is you use indexing. It's the same as if you'd do list.index. A table is just a list of dictionaries and you use dot(.) to index into it. However a keyed table does not have the same structure so you'll have to unkey it first.

How to sum multiple elements from single record

I have table trade:([]time:`time$(); sym:`symbol$(); price:`float$(); size:`long$())
with e.g. 1000 records, with e.g. 10 unique syms. I want to sum the first 4 prices for each sym.
My code looks like:
priceTable: select price by sym from trade;
amountTable: select count price by sym from trade;
amountTable: `sym`amount xcol amountTable;
resultTable: amountTable ij priceTable;
So my new table looks like: resultTable
sym | amount price
-------| --------------------------------------------------------------
instr0 | 106 179.2208 153.7646 155.2658 143.8163 107.9041 195.521 ..
The result of command: res: select sum price from resultTable where i = 1:
price
..
----------------------------------
14.71512 153.2244 154.1642 196.5744
Now, when I want to sum elements I receive: sum res
price| 14.71512 153.2244 154.1642 196.5744 170.6052 61.26522 45.70606
46.9057..
When I want to count elements in res: count res
1
I assume that res is a single record with many values, how can I sum all of those values, or how can I sum first for?
You can use "each" to run the sum on each row:
select sum each price from res
Or if you want to run on resoultTable:
select sum each price from resoultTable
To sum the first four prices for each row, use a dyadic each-right:
select sum each 4#/:price from resoultTable
Or you could do all of this very easily, in one step:
select COUNT:count i, SUM:sum price, SUM4:sum 4#price by sym from trade
q)trade:([]time:10?.z.d; sym:10#`a`b`c; price:100.+til 10; size:10+til 10)
One caveat with take (#) operator is, if the elements in the list are lesser than the take count , it treats the list as circular and start retruning the repetative results. E.g. check out the 4th price for symbol b and c.
q)select 4#price by sym from trade
sym| price
---| ---------------
a | 100 103 106 109
b | 101 104 107 101 //101 - 2 times
c | 102 105 108 102 //102 - 2 times
Using sublist can ensure that it the elemnts are lesser than passed count argument , it will just return the smaller list.
q)select sublist[4;price] by sym from trade
sym| price
---| ----------------
a | 100 103 106 109f
b | 101 104 107f
c | 102 105 108f

Crystal Report will not sum the duplicated record

I'm working in Crystal Report 2008, And I have a data with like this:
Date KG MT SQM
--------------------------------------------
01-01-2013 25000 25
01-01-2013 15000 15
01-01-2013 15000 15 -----(duplicated)
01-02-2013 13000 13
01-02-2013 12000 12
01-03-2013 18000 18
01-03-2013 33000 33
Then I tried to group it by date so the output is like this: so it's already removed the duplicated and sum the vlaue without a problem.
Date KG MT SQM
--------------------------------------------
01-01-2013 40000 40 55000
01-02-2013 25000 25 25000
01-03-2013 51000 51 51000
TOTAL 116,000 116 131,000
Then I want to add the SQM, and when I sum it, it will sum all the record in the group also with duplicated record.
(if you will sum the 3rd field, it also included in sum the duplicated record should be only '116,000' and not '131,000')
My question, how can I suppressed if duplicated and at the same time it will give me summary of the group to show the correct value?
Thanks,
Captain16