KDB Converting Subselect to Q Query - kdb

We have a Q query running on tick data which consolidates to OHLC on 1-minute bars.
select subsel:(
exec last datetime.date+1 xbar datetime.minute.z.Z
from `base
where instrument=`GBPUSD,
datetime=datetime.date+1 xbar datetime.minute.z.Z),
max(datetime),
min(datetime),
Open:first price,
High:max price,
Low:min price,
Close:last price,
Volume:count(i)
by DT:($)datetime.date+1 xbar datetime.minute.z.Z
from `base
where instrument=`GBPUSD,
datetime>=2017.07.03T10:20:00.00,
datetime<2017.07.03T10:20:59.999
The problem is the xbar date is synthetic on both the main table and the 'subselect', the exec "datetime=" needs to reference the main table and cannot find the alias approach to use. Considered an ej but as both sides are synthetic also could not find the construct.

There are several issues with your query before we even get to the subselect. First, datetime.minute.z.Z is invalid syntax. You probably don't need the .z.Z suffix there. Second, 1 xbar is redundant: 1 xbar x is x for integer x and datetime.minute is integer. You can just do datetime.date+datetime.minute to get datetimes rounded to minutes. (Note that if you use timestamps, as you should, rounding would simply be 0D00:01 xbar timestamp, for datetime, you would have to precompute a minute as U:reciprocal 24*60 and use that with xbar - U xbar timestamp.) Fourth, you cast xbar'd timestamps to strings in the by clause. If you really want them as strings - do it as a separate update after the aggregation. Finally, there are some minor issues such as redundant parentheses and ($) which in q should be spelled as string.
Now, back to the subselect. I think once you resolve the issues I highlighted above, you will find out that you don't need the subquery at all. The result will already have the xbar'd timestamps in the key column. If you want the result as a regular table - just use 0!.

Related

most performant way to get asof price given a list of timestamps

I have a list of timestamps spanning multiple dates ( no sym, just timestamps). These can be 1000/2000 at times, spanning multiple dates.
What's the most performant way to hit an hdb and get the closest price available for each timestamp?
select from hdbtable where date = x -> can be over 60mm rows.
To do this for each date and then an aj on top is very poor.
Any suggestions are welcome
The most performant way to aj, assuming the HDB follows the standard conventions of date-partitioned with `p# attribute on sym, is
aj[`sym`time;select sym,time,other from myTable where …;select sym,time,price from prices where date=x]
There should be no additional filters/where-clause on the prices table other than date.
You're saying you have no syms just timestamps but what does that mean? Does that mean you want the price of all syms at that timestamp or you want the last price of any sym at that timestamp? The former is easy as you can just join your timestamps to your distinct sym list and use that as the "left" table in the aj. The latter will not be as easy as the HDB data likely isn't fully sorted on time, it's likely sorted by sym and then time. In that case you might have to again join your timestamps to your distinct sym list and aj for the price for all syms and from that result take the one with the max time.
So I guess it depends on a few factors. More info might help.
EDIT: suggestion based on further discussion:
targetTimes:update targetTime:time from ([]time:"n"$09:43:19 10:27:58 13:12:11 15:34:03);
res:aj0[`sym`time;(select distinct sym from trade where date=2021.01.22)cross targetTimes;select sym,time,price from trade where date=2021.01.22];
select from res where not null price,time=(max;time)fby targetTime
sym time targetTime price
----------------------------------------------------
AQMS 0D09:43:18.999937967 0D09:43:19.000000000 4.5
ARNA 0D10:27:57.999842638 0D10:27:58.000000000 76.49
GE 0D15:34:02.999979520 0D15:34:03.000000000 11.17
HAL 0D13:12:10.997972224 0D13:12:11.000000000 18.81
This gives the price of whichever sym is closest to your targetTime. Then you would peach this over multiple dates:
{targetTimes: ...;res:aj0[...];select from res ...}peach mydates;
Note that what's making this complicated is your requirement that it be the price of any sym that's closest to your sym-less targetTimes. This seems strange - usually you would want the price of sym(s) as of a particular time, not the price of anything closest to a particular time.
You can use multithreading to optimize your query, with each thread being assigned a date to process, essentially utilising more than just one core:
{select from hdbtable where date = x} peach listofdates
More info on multithreading can be found here, and more info on peach can be found here

How to find the days having a drawdown greater than X bips?

What would be the most idiomatic way to find the days with a drawdown greater than X bips? I again worked my way through some queries but they become boilerplate ... maybe there is a simpler more elegant alternative:
q)meta quotes
c | t f a
----| -----
date| z
sym | s
year| j
bid | f
ask | f
mid | f
then I do:
bips:50;
`jump_in_bips xdesc distinct select date,jump_in_bips from (update date:max[date],jump_in_bips:(max[mid]-min[mid])%1e-4 by `date$date from quotes where sym=accypair) where jump_in_bips>bips;
but this will give me the days for which there has been a jump in that number of bips and not only the drawdowns.
I can of course put this result above in a temporary table and do several follow up selects like:
select ... where mid=min(mid),date=X
select ... where mid=max(mid),date=X
to check that the max(mid) was before the min(mid) ... is there a simpler, more idiomatic way?
I think maxs is the key function here, which allows you to maintain a running historical maximum, and you can compare your current value to that maximum. If you have some table quote which contains some series of mids (mids) and timestamps (date), the following query should return the days where you saw a drawdown greater than a certain value:
key select by `date$date from quote
where bips<({(maxs[x]-x)%1e-4};mid) fby `date$date
The lambda {(maxs[x]-x)%1e-4} is doing the comparison at each point to the historical maximum and checking if it's greater than bips, and fby lets you apply the where clause group-wise by date. Grouping with a by on date and taking the key will then return the days when this occurred.
If you want to preserve the information for the max drawdown you can use an update instead:
select max draw by date from
(update draw:(maxs[mid]-mid)%1e-4 by date from #[quote;`date;`date$])
where bips<draw
The date is updated separately with a direct modification to quote, to avoid repeated casting.
Difference between max and min mids for given date may be both increase and drawdown. Depending on if max mid precedes min. Also, as far a sym columns exists, I assume you may have different symbols in the table and want to get drawdowns for all of them.
For example if there are 3 quotes for given day and sym: 1.3000 1.2960 1.3010, than the difference between 2nd and 3rd is 50 pips, but this is increase.
The next query can be used to get dates and symbols with drawdown higher than given threshold
select from
(select drawdown: {max maxs[x]-x}mid
by date, sym from quotes)
where drawdown>bips*1e-4
{max maxs[x]-x} gives maximum drawdown for given date by subtracting each mid for maximum of preceding mids.

How to optimize a batch pivotization?

I have a datetime list (which for some reason I call it column date) containing over 1k datetime.
adates:2017.10.20T00:02:35.650 2017.10.20T01:57:13.454 ...
For each of these dates I need to select the data from some table, then pivotize by a column t i.e. expiry, add the corresponding date datetime as column to the pivotized table and stitch together the pivotization for all the dates. Note that I should be able to identify which pivotization corresponds to a date and that's why I do it one by one:
fPivot:{[adate;accypair]
t1:select from volatilitysurface_smile where date=adate,ccypair=accypair;
mycols:`atm`s10c`s10p`s25c`s25p;
t2:`t xkey 0!exec mycols#(stype!mid) by t:t from t1;
t3:`t xkey select distinct t,tenor,xi,volofvol,delta_type,spread from t1;
result:ej[`t;t2;t3];
:result}
I then call this function for every datetime adates as follows:
raze {[accypair;adate] `date xcols update date:adate from fPivot[adate;accypair] }[`EURCHF] #/: adates;
this takes about 90s. I wonder if there is a better way e.g. do a big pivotization rather than running one pivotization per date and then stitching it all together. The big issue I see is that I have no apparent way to include the date attribute as part of the pivotization and the date can not be lost otherwise I can't reconciliate the results.
If you havent been to the wiki page on pivoting then it may be a good start. There is a section on a general pivoting function that makes some claims to being somewhat efficient:
One user reports:
This is able to pivot a whole day of real quote data, about 25 million
quotes over about 4000 syms and an average of 5 levels per sym, in a
little over four minutes.
As for general comments, I would say that the ej is unnecessary as it is a more general version of ij, allowing you to specify the key column. As both t2 and t3 have the same keying I would instead use:
t2 ij t3
Which may give you a very minor performance boost.
OK I solved the issue by creating a batch version of the pivotization that keeps the date (datetime) table field when doing the group by bit needed to pivot i.e. by t:t from ... to by date:date,t:t from .... It went from 90s down to 150 milliseconds.
fBatchPivot:{[adates;accypair]
t1:select from volatilitysurface_smile where date in adates,ccypair=accypair;
mycols:`atm`s10c`s10p`s25c`s25p;
t2:`date`t xkey 0!exec mycols#(stype!mid) by date:date,t:t from t1;
t3:`date`t xkey select distinct date,t,tenor,xi,volofvol,delta_type,spread from t1;
result:0!(`date`t xasc t2 ij t3);
:result}

MDX number of days between shell date dimension and regular date dimension

I have a shell date dimension, and a sale date dimension. Having trouble setting up a calculated measure with the difference in days between the 2 dates.
I have tried a number of things, and the calculation always seems to return an error.
mdx example is:
WITH
MEMBER [Measures].[TimeDate] AS [Date].[Day].currentmember
MEMBER [Measures].[DSODate] AS [DSO Date].[Day].currentmember
MEMBER [Measures].[DaysSinceSale] AS
DateDiff(
"d"
, [Measures].DSODate.MemberValue
, [Measures].TimeDate.MemberValue
)
Select
{[Measures].[DaysSinceSale]} ON COLUMNS,
{[Date].[Day].members} ON ROWS
from [Receivables];
I have tried using DateDiff, and tried just subtracting the 2 dates.
Assuming it may have something to do with the 2 date dimensions being of different hierarchies, but i am not really sure how to handle that.
MDX Results
Date conversions can be tricky in mdx so maybe initially try the following simple approach:
WITH
MEMBER [Measures].[TimeDate] AS [Date].[Day].currentmember
MEMBER [Measures].[DSODate] AS [DSO Date].[Day].currentmember
MEMBER [Measures].[DaysSinceSale] AS
DateDiff(
"d"
, VBA!CDate([Measures].DSODate.MemberValue)
, VBA!CDate([Measures].TimeDate.MemberValue)
)
Select
{[Measures].[DaysSinceSale]} ON COLUMNS,
{[Date].[Day].members} ON ROWS
from [Receivables];
Otherwise you might need to use the key and an approach similar to this:
MDX - Converting Member Value to CDate
I found a way to get this to work ...
Main issue was that i didn't have a crossjoin, like whytheq mentioned.
I also didn't need the custom Measures declared at the top.
The other adjustment i made was to utilize the DateKey in the date calculation. That seemed to work in all my tests, and improved performance quite a bit.
WITH
MEMBER [Measures].[DaysSinceSale] AS
[Date].[DateKey].CurrentMember.MemberValue - [DSO Date].[DateKey].CurrentMember.MemberValue
Select
{[Measures].[DaysSinceSale]} ON COLUMNS,
{[Date].[DateKey].Members * [DSO Date].[DateKey].members} ON ROWS
from [Receivables];
If you see any issues that may arise with using DateKey let me know. For what i am doing that seemed to pull back the correct date value, and allowed me to find the difference between dates without using a datediff function.

How to work with date and time in KDB

I tried to work with dateDtimespan type by subtracting one dateDtimespan from another, but KDB (QPad) always shows 0 as a result, why?
Also if I have, say, datetime 12.11.2014:22:33:00.000000000 in one column and only time 22:32:00.000000000 in another, how I may remove date part from the first column to subtract time portion from the second column?
to remove the date, you can use the cast operator, $. To reference only the time, you can prefix $ with `time as shown below.
q).z.z
2015.02.23T14:10:33.523
q)`time$.z.z
14:10:30.731
q)t:([]ts:10#.z.N;ti:.z.t-til 10)
q)exec `time$ts-ti from t
00:00:00.000 00:00:00.001 00:00:00.002 00:00:00.003 00:00:00.004 00:00:00.005..
You can see more examples here. http://code.kx.com/q/ref/casting/#cast
I'll prefer downcasting the timestamp to timespan first and then calculate the diff i.e. (`timespan$p)-n. No harm in using the other way (`timespan$p-n) but it is less explicit than the former.
q)dt:( [] p:2#2014.12.11D22:33:00.000000000;n:2#22:32:00.000000000)
q)select (`timespan$p)-n from dt
p
--------------------
0D00:01:00.000000000
0D00:01:00.000000000