kdb q - count subtable between 2 dates - kdb

I have a table
t:`date xasc ([]date:100?2018.01.01+til 100;price:100?til 100;acc:100?`a`b)
and would like to have a new column in t which contains the counts of entries in t where date is in the daterange of the previous 14 days and the account is the same as in acc. For example, if there is a row
date price acc prevdate prevdate1W countprev14
2018.01.10 37 a 2018.01.09 2018.01.03 ?
then countprev14 should contain the number of observations between 2018.01.03 and 2018.01.09 where acc=a
The way I am currently doing it can probably be improved:
f:{[dates;ac;t]count select from t where date>=(dates 0),date<=(dates 1),acc=ac}[;;t]
(f')[(exec date-7 from t),'(exec date-1 from t);exec acc from t]
Thanks for the help

Another method is using a window join (wj1):
https://code.kx.com/q/ref/joins/#wj-wj1-window-join
dates:exec date from t;
d:(dates-7;dates-1);
wj1[d;`acc`date;t;(`acc`date xasc t;(count;`i))]

I think you're looking for something like this:
update count14:{c-0^(c:sums 1&x)y bin y-14}[i;date] by acc from t
this uses sums to get the running counts, bin to find the running count from 14 days prior, and then indexes back into the list of running counts to get the counts from that date.
The difference between the counts then and now are those from the latest 14 days.
Note the lambda here allows us to store the result from the sums easily and avoid unnecessary recomputation.

Related

most performant way to get asof price given a list of timestamps

I have a list of timestamps spanning multiple dates ( no sym, just timestamps). These can be 1000/2000 at times, spanning multiple dates.
What's the most performant way to hit an hdb and get the closest price available for each timestamp?
select from hdbtable where date = x -> can be over 60mm rows.
To do this for each date and then an aj on top is very poor.
Any suggestions are welcome
The most performant way to aj, assuming the HDB follows the standard conventions of date-partitioned with `p# attribute on sym, is
aj[`sym`time;select sym,time,other from myTable where …;select sym,time,price from prices where date=x]
There should be no additional filters/where-clause on the prices table other than date.
You're saying you have no syms just timestamps but what does that mean? Does that mean you want the price of all syms at that timestamp or you want the last price of any sym at that timestamp? The former is easy as you can just join your timestamps to your distinct sym list and use that as the "left" table in the aj. The latter will not be as easy as the HDB data likely isn't fully sorted on time, it's likely sorted by sym and then time. In that case you might have to again join your timestamps to your distinct sym list and aj for the price for all syms and from that result take the one with the max time.
So I guess it depends on a few factors. More info might help.
EDIT: suggestion based on further discussion:
targetTimes:update targetTime:time from ([]time:"n"$09:43:19 10:27:58 13:12:11 15:34:03);
res:aj0[`sym`time;(select distinct sym from trade where date=2021.01.22)cross targetTimes;select sym,time,price from trade where date=2021.01.22];
select from res where not null price,time=(max;time)fby targetTime
sym time targetTime price
----------------------------------------------------
AQMS 0D09:43:18.999937967 0D09:43:19.000000000 4.5
ARNA 0D10:27:57.999842638 0D10:27:58.000000000 76.49
GE 0D15:34:02.999979520 0D15:34:03.000000000 11.17
HAL 0D13:12:10.997972224 0D13:12:11.000000000 18.81
This gives the price of whichever sym is closest to your targetTime. Then you would peach this over multiple dates:
{targetTimes: ...;res:aj0[...];select from res ...}peach mydates;
Note that what's making this complicated is your requirement that it be the price of any sym that's closest to your sym-less targetTimes. This seems strange - usually you would want the price of sym(s) as of a particular time, not the price of anything closest to a particular time.
You can use multithreading to optimize your query, with each thread being assigned a date to process, essentially utilising more than just one core:
{select from hdbtable where date = x} peach listofdates
More info on multithreading can be found here, and more info on peach can be found here

kdb - how to augment table with missing dates in a dynamic/fast way

I have an in-memory table with (date, sym, symType, factor, weight) as columns.
There are cases where this in-memory table once queried for a particular date range is missing an entire date. Could be today's data, or if we're querying for multiple dates, could be a day in the middle, or perhaps multiple days, or the last date, or the beginning.
How can I come up with with a query that fills in those missing dates with the max date up to that point?
So if we have data as follows:
Examples:
.z.D
.z.D-2
.z.D-3
.z.D-6
.z.D-7
I'd like the table to look like this:
.z.D -> .z.D
.z.D-1 -> copy of .z.D-2
.z.D-2 -> .z.D-2
.z.D-3 -> .z.D-3
.z.D-4 -> copy of .z.D-6
.z.D-5 -> copy .z.d-6
.z.D-6 -> .z.D-6
.z.D-7 -> .z.D-7
If in your query today is missing, use previous available date as today.
If in your query the last day is yesterday and it's missing, use the the previous available day as yesterday and so on.
if your last (min date) is missing, use the next available date upwards.
I can do this manually by identifying missing dates and going through missing dates day by day, but I'm wondering if there's a much better way to do this.
aj can work for dates in the middle by constructing a ([] date: listofdesireddates) cross ([] sym: listofsyms) cross ([] sectors: symtype) and then do an aj with the table but it doesn't solve all cases e.g if the missing day is today or at the start.
Can you come up with a reproducible example as to why aj doesn't work? Normal aj usage should solve this problem:
t1:([]date:.z.D-til 8;sym:`ABC);
t2:`date xasc([]date:.z.D-0 2 3 6 7;sym:`ABC;data:"I"$ssr[;".";""]each string .z.D-0 2 3 6 7);
q)aj[`sym`date;t1;t2]
date sym data
-----------------------
2020.07.20 ABC 20200720
2020.07.19 ABC 20200718
2020.07.18 ABC 20200718
2020.07.17 ABC 20200717
2020.07.16 ABC 20200714
2020.07.15 ABC 20200714
2020.07.14 ABC 20200714
2020.07.13 ABC 20200713
/If you need your last date to fill "upwards" then use fills:
update fills data by sym from aj[`sym`date;([]date:.z.D-til 9;sym:`ABC);t2]
A quick guess but a step function with xgroup on the result seems like it will work.
res:getFromTab[dates];
f:{`date xcols:update date:x from y#x};
xgrp:`s#`date xasc `date xgroup res;
raze f[;xgrp] each dates
Performance might be horrible ...

How to find the days having a drawdown greater than X bips?

What would be the most idiomatic way to find the days with a drawdown greater than X bips? I again worked my way through some queries but they become boilerplate ... maybe there is a simpler more elegant alternative:
q)meta quotes
c | t f a
----| -----
date| z
sym | s
year| j
bid | f
ask | f
mid | f
then I do:
bips:50;
`jump_in_bips xdesc distinct select date,jump_in_bips from (update date:max[date],jump_in_bips:(max[mid]-min[mid])%1e-4 by `date$date from quotes where sym=accypair) where jump_in_bips>bips;
but this will give me the days for which there has been a jump in that number of bips and not only the drawdowns.
I can of course put this result above in a temporary table and do several follow up selects like:
select ... where mid=min(mid),date=X
select ... where mid=max(mid),date=X
to check that the max(mid) was before the min(mid) ... is there a simpler, more idiomatic way?
I think maxs is the key function here, which allows you to maintain a running historical maximum, and you can compare your current value to that maximum. If you have some table quote which contains some series of mids (mids) and timestamps (date), the following query should return the days where you saw a drawdown greater than a certain value:
key select by `date$date from quote
where bips<({(maxs[x]-x)%1e-4};mid) fby `date$date
The lambda {(maxs[x]-x)%1e-4} is doing the comparison at each point to the historical maximum and checking if it's greater than bips, and fby lets you apply the where clause group-wise by date. Grouping with a by on date and taking the key will then return the days when this occurred.
If you want to preserve the information for the max drawdown you can use an update instead:
select max draw by date from
(update draw:(maxs[mid]-mid)%1e-4 by date from #[quote;`date;`date$])
where bips<draw
The date is updated separately with a direct modification to quote, to avoid repeated casting.
Difference between max and min mids for given date may be both increase and drawdown. Depending on if max mid precedes min. Also, as far a sym columns exists, I assume you may have different symbols in the table and want to get drawdowns for all of them.
For example if there are 3 quotes for given day and sym: 1.3000 1.2960 1.3010, than the difference between 2nd and 3rd is 50 pips, but this is increase.
The next query can be used to get dates and symbols with drawdown higher than given threshold
select from
(select drawdown: {max maxs[x]-x}mid
by date, sym from quotes)
where drawdown>bips*1e-4
{max maxs[x]-x} gives maximum drawdown for given date by subtracting each mid for maximum of preceding mids.

TABLEAU Calculating a Running DISTINCT COUNT on usernames for last 3 months

Issue:
Need to show RUNNING DISTINCT users per 3-month interval^^. (See goal table as reference). However, “COUNTD” does not help even after table calculation or “WINDOW_COUNT” or “WINDOW_SUM” function.
^^RUNNING DISTINCT user means DISTINCT users in a period of time (Jan - Mar, Feb – Apr, etc.). The COUNTD option only COUNT DISTINCT users in a window. This process should go over 3-month window to find the DISTINCT users.
Original Table
Date Username
1/1/2016 A
1/1/2016 B
1/2/2016 C
2/1/2016 A
2/1/2016 B
2/2/2016 B
3/1/2016 B
3/1/2016 C
3/2/2016 D
4/1/2016 A
4/1/2016 C
4/2/2016 D
4/3/2016 F
5/1/2016 D
5/2/2016 F
6/1/2016 D
6/2/2016 F
6/3/2016 G
6/4/2016 H
Goal Table
Tried Methods:
Step-by-step:
Tried to distribute the problem into steps, but due to columnar nature of tableau, I cannot successfully run COUNT or SUM (any aggregate command) on the LAST STEP of the solution.
STEP 0 Raw Data
This tables show the structure Data, as it is in the original table.
STEP 1 COUNT usernames by MONTH
The table show the count of users by month. You will notice because user B had 2 entries he is counted twice. In the next step we use DISTINCT COUNT to fix this issue.
STEP 2 DISTINCT COUNT by MONTH
Now we can see who all were present in a month, next step would be to see running DISTINCT COUNT by MONTH for 3 months
STEP 3 RUNNING DISTINCT COUNT for 3 months
Now we can see the SUM of DISTINCT COUNT of usernames for running 3 months. If you turn the MONTH INTERVAL to 1 from 3, you can see STEP 2 table.
LAST STEP Issue Step
GOAL: Need the GRAND TOTAL to be the SUM of MONTH column.
Request:
I want to calculate the SUM of '1' by MONTH. However, I am using WINDOW function and aggregating the data that gave me an Error.
WHAT I NEED
Jan Feb March April May Jun
3 3 4 5 5 6
WHAT I GOT
Jan Feb March April May Jun
1 1 1 1 1 1
My Output after tried methods: Attached twbx file. DISTINCT_count_running_v1
HELP taken:
https://community.tableau.com/thread/119179 ; Tried this method but stuck at last step
https://community.tableau.com/thread/122852 ; Used some parts of this solution
The way I approached the problem was identifying the minimum login date for each user and then using that date to count the distinct number of users. For example, I have data in this format. I created a calculated field called Min User Login Date as { FIXED [User]:MIN([Date])} and then did a CNTD(USER) on Min User Login Date to get the unique user count by date. If you want running total, then you can do quick table calculation on Running Total on CNTD(USER) field.
You need to put Month(date) and count(username) in the columns then you will get result what you expect.
See screen below

Q (KDB) selecting today's date within date range

I am trying to set up an dynamic threshold by different user, but only return result from today's date. I was able to return all the records from past 30 days, but I am having trouble only outputting today's date based on the calculation from past 30 days.. I am new to q language and really having a trouble with this simple statement :( (have tried and/or statement but not executing..) Thank you for all the help in advance!
select user, date, real*110 from table where date >= .z.D - 30, real> (3*(dev;real) fby user)+((avg;real) fby user)
Are you saying that you want to determine if any of todays "real" values are greater than 3 sigma based on the past 30 days? If so (without knowing much about your table structure) I'm guessing you could use something like this:
q)t:t,update user:`user2,real+(.0,39#10.0) from t:([] date:.z.D-til 40;user:`user1;real:20.1,10.0+39?.1 .0 -.1);
q)sigma:{avg[y]+x*dev y};
q)select from t where date>=.z.D-30, ({(.z.D=x`date)&x[`real]>sigma[3]exec real from x where date<>.z.D};([]date;real)) fby user
date user real
---------------------
2016.03.21 user1 20.1