xbar rounding up and application of last - kdb

q) t:([]time:(2021.01.31D17:50:19.986000000;2021.01.31D18:01:32.894000000;2021.01.31D18:02:08.884000000;2021.01.31D18:25:25.984000000;2021.01.31D18:25:27.134000000;2021.01.31D18:25:28.834000000;2021.01.31D18:25:29.934000000);val:(3.2;2.9;3.9;6.8;5.0;3.0;2.2);sym:(`AUD;`AUD;`AUD;`AUD;`AUD;`AUD;`AUD))
time val sym
-------------------------------------
2021.01.31D17:50:19.986000000 3.2 AUD
2021.01.31D18:01:32.894000000 2.9 AUD
2021.01.31D18:02:08.884000000 3.9 AUD
2021.01.31D18:25:25.984000000 6.8 AUD
2021.01.31D18:25:27.134000000 5 AUD
2021.01.31D18:25:28.834000000 3 AUD
2021.01.31D18:25:29.934000000 2.2 AUD
prices: 0!select last val by sym, 0D00:01+0D00:01 xbar time from t
sym x val
-------------------------------------
AUD 2021.01.31D17:51:00.000000000 3.2
AUD 2021.01.31D18:02:00.000000000 2.9
AUD 2021.01.31D18:03:00.000000000 3.9
AUD 2021.01.31D18:26:00.000000000 2.2
for the first row in prices for e.g. how does q work to ensure that the val is not the last value between 2021.01.31D17:51:00.000000000 and 2021.01.31D17:52:00.000000000 but that between 2021.01.31D17:50:00.000000000 and 2021.01.31D17:51:00.000000000? Asking because the command involves 0D00:01+0D00:01 xbar time and not just 0D00:01 xbar time.
Appreciate your help.

Kdb still reads right-of-left within the sub-components of a select statement, so
0D00:01+0D00:01 xbar time
is read as
0D00:01 xbar time
and the additional 0D00:01 is added after the xbar operation. So the 0D00:01+ really only effects the "display" of the result, not the values used in the grouping.
This is what you possibly think kdb would confuse it for:
0D00:01 xbar 0D00:01+time
The above would return last value between 17:51 and 17:52 since the times are bumped up before the xbar/grouping rather than after the xbar/grouping but the results would actually be the same because this is really just a labelling exercise.

Related

most performant way to get asof price given a list of timestamps

I have a list of timestamps spanning multiple dates ( no sym, just timestamps). These can be 1000/2000 at times, spanning multiple dates.
What's the most performant way to hit an hdb and get the closest price available for each timestamp?
select from hdbtable where date = x -> can be over 60mm rows.
To do this for each date and then an aj on top is very poor.
Any suggestions are welcome
The most performant way to aj, assuming the HDB follows the standard conventions of date-partitioned with `p# attribute on sym, is
aj[`sym`time;select sym,time,other from myTable where …;select sym,time,price from prices where date=x]
There should be no additional filters/where-clause on the prices table other than date.
You're saying you have no syms just timestamps but what does that mean? Does that mean you want the price of all syms at that timestamp or you want the last price of any sym at that timestamp? The former is easy as you can just join your timestamps to your distinct sym list and use that as the "left" table in the aj. The latter will not be as easy as the HDB data likely isn't fully sorted on time, it's likely sorted by sym and then time. In that case you might have to again join your timestamps to your distinct sym list and aj for the price for all syms and from that result take the one with the max time.
So I guess it depends on a few factors. More info might help.
EDIT: suggestion based on further discussion:
targetTimes:update targetTime:time from ([]time:"n"$09:43:19 10:27:58 13:12:11 15:34:03);
res:aj0[`sym`time;(select distinct sym from trade where date=2021.01.22)cross targetTimes;select sym,time,price from trade where date=2021.01.22];
select from res where not null price,time=(max;time)fby targetTime
sym time targetTime price
----------------------------------------------------
AQMS 0D09:43:18.999937967 0D09:43:19.000000000 4.5
ARNA 0D10:27:57.999842638 0D10:27:58.000000000 76.49
GE 0D15:34:02.999979520 0D15:34:03.000000000 11.17
HAL 0D13:12:10.997972224 0D13:12:11.000000000 18.81
This gives the price of whichever sym is closest to your targetTime. Then you would peach this over multiple dates:
{targetTimes: ...;res:aj0[...];select from res ...}peach mydates;
Note that what's making this complicated is your requirement that it be the price of any sym that's closest to your sym-less targetTimes. This seems strange - usually you would want the price of sym(s) as of a particular time, not the price of anything closest to a particular time.
You can use multithreading to optimize your query, with each thread being assigned a date to process, essentially utilising more than just one core:
{select from hdbtable where date = x} peach listofdates
More info on multithreading can be found here, and more info on peach can be found here

kdb q - count subtable between 2 dates

I have a table
t:`date xasc ([]date:100?2018.01.01+til 100;price:100?til 100;acc:100?`a`b)
and would like to have a new column in t which contains the counts of entries in t where date is in the daterange of the previous 14 days and the account is the same as in acc. For example, if there is a row
date price acc prevdate prevdate1W countprev14
2018.01.10 37 a 2018.01.09 2018.01.03 ?
then countprev14 should contain the number of observations between 2018.01.03 and 2018.01.09 where acc=a
The way I am currently doing it can probably be improved:
f:{[dates;ac;t]count select from t where date>=(dates 0),date<=(dates 1),acc=ac}[;;t]
(f')[(exec date-7 from t),'(exec date-1 from t);exec acc from t]
Thanks for the help
Another method is using a window join (wj1):
https://code.kx.com/q/ref/joins/#wj-wj1-window-join
dates:exec date from t;
d:(dates-7;dates-1);
wj1[d;`acc`date;t;(`acc`date xasc t;(count;`i))]
I think you're looking for something like this:
update count14:{c-0^(c:sums 1&x)y bin y-14}[i;date] by acc from t
this uses sums to get the running counts, bin to find the running count from 14 days prior, and then indexes back into the list of running counts to get the counts from that date.
The difference between the counts then and now are those from the latest 14 days.
Note the lambda here allows us to store the result from the sums easily and avoid unnecessary recomputation.

How to find the days having a drawdown greater than X bips?

What would be the most idiomatic way to find the days with a drawdown greater than X bips? I again worked my way through some queries but they become boilerplate ... maybe there is a simpler more elegant alternative:
q)meta quotes
c | t f a
----| -----
date| z
sym | s
year| j
bid | f
ask | f
mid | f
then I do:
bips:50;
`jump_in_bips xdesc distinct select date,jump_in_bips from (update date:max[date],jump_in_bips:(max[mid]-min[mid])%1e-4 by `date$date from quotes where sym=accypair) where jump_in_bips>bips;
but this will give me the days for which there has been a jump in that number of bips and not only the drawdowns.
I can of course put this result above in a temporary table and do several follow up selects like:
select ... where mid=min(mid),date=X
select ... where mid=max(mid),date=X
to check that the max(mid) was before the min(mid) ... is there a simpler, more idiomatic way?
I think maxs is the key function here, which allows you to maintain a running historical maximum, and you can compare your current value to that maximum. If you have some table quote which contains some series of mids (mids) and timestamps (date), the following query should return the days where you saw a drawdown greater than a certain value:
key select by `date$date from quote
where bips<({(maxs[x]-x)%1e-4};mid) fby `date$date
The lambda {(maxs[x]-x)%1e-4} is doing the comparison at each point to the historical maximum and checking if it's greater than bips, and fby lets you apply the where clause group-wise by date. Grouping with a by on date and taking the key will then return the days when this occurred.
If you want to preserve the information for the max drawdown you can use an update instead:
select max draw by date from
(update draw:(maxs[mid]-mid)%1e-4 by date from #[quote;`date;`date$])
where bips<draw
The date is updated separately with a direct modification to quote, to avoid repeated casting.
Difference between max and min mids for given date may be both increase and drawdown. Depending on if max mid precedes min. Also, as far a sym columns exists, I assume you may have different symbols in the table and want to get drawdowns for all of them.
For example if there are 3 quotes for given day and sym: 1.3000 1.2960 1.3010, than the difference between 2nd and 3rd is 50 pips, but this is increase.
The next query can be used to get dates and symbols with drawdown higher than given threshold
select from
(select drawdown: {max maxs[x]-x}mid
by date, sym from quotes)
where drawdown>bips*1e-4
{max maxs[x]-x} gives maximum drawdown for given date by subtracting each mid for maximum of preceding mids.

How to optimize a batch pivotization?

I have a datetime list (which for some reason I call it column date) containing over 1k datetime.
adates:2017.10.20T00:02:35.650 2017.10.20T01:57:13.454 ...
For each of these dates I need to select the data from some table, then pivotize by a column t i.e. expiry, add the corresponding date datetime as column to the pivotized table and stitch together the pivotization for all the dates. Note that I should be able to identify which pivotization corresponds to a date and that's why I do it one by one:
fPivot:{[adate;accypair]
t1:select from volatilitysurface_smile where date=adate,ccypair=accypair;
mycols:`atm`s10c`s10p`s25c`s25p;
t2:`t xkey 0!exec mycols#(stype!mid) by t:t from t1;
t3:`t xkey select distinct t,tenor,xi,volofvol,delta_type,spread from t1;
result:ej[`t;t2;t3];
:result}
I then call this function for every datetime adates as follows:
raze {[accypair;adate] `date xcols update date:adate from fPivot[adate;accypair] }[`EURCHF] #/: adates;
this takes about 90s. I wonder if there is a better way e.g. do a big pivotization rather than running one pivotization per date and then stitching it all together. The big issue I see is that I have no apparent way to include the date attribute as part of the pivotization and the date can not be lost otherwise I can't reconciliate the results.
If you havent been to the wiki page on pivoting then it may be a good start. There is a section on a general pivoting function that makes some claims to being somewhat efficient:
One user reports:
This is able to pivot a whole day of real quote data, about 25 million
quotes over about 4000 syms and an average of 5 levels per sym, in a
little over four minutes.
As for general comments, I would say that the ej is unnecessary as it is a more general version of ij, allowing you to specify the key column. As both t2 and t3 have the same keying I would instead use:
t2 ij t3
Which may give you a very minor performance boost.
OK I solved the issue by creating a batch version of the pivotization that keeps the date (datetime) table field when doing the group by bit needed to pivot i.e. by t:t from ... to by date:date,t:t from .... It went from 90s down to 150 milliseconds.
fBatchPivot:{[adates;accypair]
t1:select from volatilitysurface_smile where date in adates,ccypair=accypair;
mycols:`atm`s10c`s10p`s25c`s25p;
t2:`date`t xkey 0!exec mycols#(stype!mid) by date:date,t:t from t1;
t3:`date`t xkey select distinct date,t,tenor,xi,volofvol,delta_type,spread from t1;
result:0!(`date`t xasc t2 ij t3);
:result}

KDB Converting Subselect to Q Query

We have a Q query running on tick data which consolidates to OHLC on 1-minute bars.
select subsel:(
exec last datetime.date+1 xbar datetime.minute.z.Z
from `base
where instrument=`GBPUSD,
datetime=datetime.date+1 xbar datetime.minute.z.Z),
max(datetime),
min(datetime),
Open:first price,
High:max price,
Low:min price,
Close:last price,
Volume:count(i)
by DT:($)datetime.date+1 xbar datetime.minute.z.Z
from `base
where instrument=`GBPUSD,
datetime>=2017.07.03T10:20:00.00,
datetime<2017.07.03T10:20:59.999
The problem is the xbar date is synthetic on both the main table and the 'subselect', the exec "datetime=" needs to reference the main table and cannot find the alias approach to use. Considered an ej but as both sides are synthetic also could not find the construct.
There are several issues with your query before we even get to the subselect. First, datetime.minute.z.Z is invalid syntax. You probably don't need the .z.Z suffix there. Second, 1 xbar is redundant: 1 xbar x is x for integer x and datetime.minute is integer. You can just do datetime.date+datetime.minute to get datetimes rounded to minutes. (Note that if you use timestamps, as you should, rounding would simply be 0D00:01 xbar timestamp, for datetime, you would have to precompute a minute as U:reciprocal 24*60 and use that with xbar - U xbar timestamp.) Fourth, you cast xbar'd timestamps to strings in the by clause. If you really want them as strings - do it as a separate update after the aggregation. Finally, there are some minor issues such as redundant parentheses and ($) which in q should be spelled as string.
Now, back to the subselect. I think once you resolve the issues I highlighted above, you will find out that you don't need the subquery at all. The result will already have the xbar'd timestamps in the key column. If you want the result as a regular table - just use 0!.