select from table distinct syms based on newest data - kdb

I have a table and I want to find out the most efficient way of filtering this table.
time date sym Capture
12:00:00.000 2022.09.12 `AAPL 2022.09.12D15:30:00.000000000
10:00:00.000 2022.09.10 `MSFT 2022.09.10D11:20:00.000000000
14:00:00.000 2022.09.12 `AAPL 2022.09.12D14:20:00.000000000
0Nt 2022.09.11 `AAPL 2022.09.11D10:05:00.000000000
16:00:00.000 2022.09.11 `AAPL 2022.09.12D17:20:00.000000000
0Nt 2022.09.11 `MSFT 2022.09.11D11:30:00.000000000
0Nt 2022.09.11 `MSFT 2022.09.11D15:00:00.000000000
It has to be returned in the same column order and of type table 98h.
I want to return distinct syms based on newest data in the order date --> time --> Capture.
Therefore, this table should return:
time date sym Capture
14:00:00.000 2022.09.12 `AAPL 2022.09.12D14:20:00.000000000
0Nt 2022.09.11 `MSFT 2022.09.11D15:00:00.000000000
Thanks for the help!

cols[table] xcols 0!select by sym from `date`time`Capture xasc table

Fby could also do it:
q)t:([]time:12:00:00.000 10:00:00.000 14:00:00.000 0N 16:00:00.000 0N 0N;date:2022.09.12 2022.09.10 2022.09.12 2022.09.11 2022.09.11 2022.09.11 2022.09.11;sym:`AAPL`MSFT`AAPL`AAPL`AAPL`MSFT`MSFT;Capture:2022.09.12D15:30:00.0 2022.09.10D11:20:00.0 2022.09.12D14:20:00.0 2022.09.11D10:05:00.0 2022.09.12D17:20:00.0 2022.09.11D11:30:00.0 2022.09.11D15:00:00.0);
q)select from t where({x=max x};Capture^date+time)fby sym
time date sym Capture
----------------------------------------------------------
14:00:00.000 2022.09.12 AAPL 2022.09.12D14:20:00.000000000
2022.09.11 MSFT 2022.09.11D15:00:00.000000000

Related

query for selecting N records

I have a table tab that has cols date,sym,value and is sorted from oldest date to the recent.
I am trying to select the past N records for each sym and am not sure of the query for this. I know that I can select based on date being within a range but I needed it based on sym irrespective if value appeared on consecutive dates or not.
You could do this with fby and the virtual row number column i:
https://code.kx.com/q/ref/fby/
q){ select from tab where ({y in x#y}[x];i) fby sym }[-2]
date sym time src price size
------------------------------------------------------------
2014.04.21 AAPL 2014.04.21D16:29:03.253000000 N 24.98 3561
2014.04.21 AAPL 2014.04.21D16:29:03.558000000 N 24.98 2733
2014.04.21 CSCO 2014.04.21D16:28:56.265000000 O 35.6 8390
2014.04.21 CSCO 2014.04.21D16:29:44.572000000 L 35.61 2286
2014.04.21 DELL 2014.04.21D16:29:35.374000000 L 29.57 1444
2014.04.21 DELL 2014.04.21D16:29:39.979000000 N 29.56 216
2014.04.21 GOOG 2014.04.21D16:29:50.569000000 N 41.87 722
2014.04.21 GOOG 2014.04.21D16:29:58.633000000 O 41.9 437
Edit: Faster way would be to use functional exec with the 5th argument n(number of records) for each sym.
raze{
//[table;where;by;cols;rows]
?[tab;enlist (in;`sym;enlist x);0b;();y]
}[;-2]'[distinct tab[`sym]]
https://code.kx.com/q/basics/funsql/
Matt's suggestions using an fby and functional select are best if you want all columns in the table returned. If you only need the date, sym & price columns returned you could use
q)ungroup select -2#date,-2#price by sym from trade
sym date price
----------------------
APPL 2021.03.13 111.77
APPL 2021.03.13 111.85
CAT 2021.03.13 246
CAT 2021.03.13 246.27
GOOG 2021.03.13 206.24
GOOG 2021.03.13 206.21
NYSE 2021.03.13 60.67
NYSE 2021.03.13 60.97
Note that this can become tedious when selecting a large number of columns. In those cases it's better to stick with Matt's suggestions.

KDB query returns more 2 columns instead of 1 for max filter

I just want to create one report where I need max price for each symbol so I wrote following query which works fine on PROD but fails on UAT. So just wanted to know if following query is the appropriate or not.
select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier
Above query returns 2 column for each symbol instead of 1. Following is the result inner query i.e select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31)
t:([]time:8#2019.03.11D09:00+"v"$0 4 8 10;sym:8#`GOOG`GOOG`MSFT`MSFT;src:8#`L`O`N`O;price:36.01 35.01 35.5 31.1 39.01 38.01 33.5 32.1;size:8#1427 708 7810 1100)
time sym src price
--------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:04.000000000 GOOG O 35.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:10.000000000 MSFT O 31.1
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:04.000000000 GOOG O 38.01
2019.03.11D09:00:08.000000000 MSFT N 33.5
2019.03.11D09:00:10.000000000 MSFT O 32.1
And output for select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier is :
t[0,2,4,7]
time sym src price
---------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:10.000000000 MSFT O 32.1
I suspect that there is something missing with the dataset that you have provided in the question. The results of your inner queries are all floats with remainders, as size is a long, it doesn't make any sense that size=(max;price) is returning any results.
To answer your question in the most general of sense, to get the max price by sym is
select from t where price=(max;price) fby sym
Applying this to the inner result you have provided
q)select from t where price=(max;price) fby sym
time sym src price size
-------------------------------------------------
2019.03.11D09:00:08.000000000 MSFT N 35.5 7810
2019.03.11D09:00:00.000000000 GOOG L 39.01 1427

Hourly data from a table containing per minute data

I have a table which contains three columns in a PostgreSQL database. The three columns are timestamp, tag and value. In this table data is automatically inserted from log file generated by SCADA server. I need hourly data from this table. (20:00:00, 21:00:00)
timestamp tag value
2019-06-06 06:00:00 x 123
2019-06-06 06:00:00 y 456
2019-06-06 06:01:00 x 123
2019-06-06 06:01:00 y 656
2019-06-06 06:02:00 x 123
2019-06-06 06:02:00 y 333
.......
.......
2019-06-06 06:59:00 x 2232
2019-06-06 06:59:00 y 654
2019-06-06 07:00:00 x 5645
2019-06-06 07:00:00 y 54654
I want data exactly at 2019-06-06 06:00:00 07:00:00 from this. The table is getting updated every minute hence I cant write it in where.
Desired Output should be like this.
timestamp tag value
2019-06-06 06:00:00 x 123
2019-06-06 06:00:00 y 456
2019-06-06 07:00:00 x 5645
2019-06-06 07:00:00 y 54654
...
.....
......
2019-06-09 07:00:00 x 5645
2019-06-09 07:00:00 y 54654
It seems you only want those rows that were recorded at exactly the full hour.
you can do that with a simple WHERE clause
select *
from the_table
where date_trunc('hour', "timestamp") = "timestamp";
date_trunc "truncates" the timestamp value to the given granularity. So minutes, seconds and milliseconds will be set to zero.
You can extract the hour part of the timestamp first, then group your result by the hour.
SELECT date_part('hour', timestamp) as hour, STRING_AGG(tag, ','), STRING_AGG(value, ',')
FROM your_table
GROUP BY hour;
Various functions to extract different parts of a timestamp.
STRING_AGG can be used to combine values from different rows.

Append columns to empty table - Q/KDB+

I'm pulling data from a source that returns tick data for stocks (timespan + float prices).
I need to build 1 table that has the tick data for each stock, while inserting new timespan index values for each one. Example:
AAPL:
t0 101.20
t3 102.10
GOOG:
t1 850.50
t2 860.10
Table:
AAPL GOOG
t0 101.20 NA
t1 NA 850.50
t2 NA 860.10
t3 102.10 NA
There would be many symbols, so I can't just manually type AAPL, GOOG etc.
While it would be possible to set up a table like you have described it would not be advisable. You would be better to set up a column to record each stock, sym in this case:
t sym price
-------------------------------------------
2018.02.05D14:11:09.241245000 AAPL 101.7808
2018.02.05D14:11:09.241246000 GOOG 103.0177
2018.02.05D14:11:09.241246000 AAPL 107.8503
2018.02.05D14:11:09.241247000 GOOG 105.3471

How to generate datatable by iterating through multiple lists? (KDB)

I have a function quotes[ticker;startDate;endDate], and a function indexConstituents[index;startDate;endDate] that yield the below:
daterange: 2017.12.05,2017.12.06;
quotes'[AAPL;daterange]
date time sym price
2017.12.05 09:45 AAPL 101.20
2017.12.06 09:45 AAPL 102.30
quotes'[GOOG;daterange]
date time sym price
2017.12.05 10:00 GOOG 800.50
quotes'[BBRY;daterange]
date time sym price
2017.12.06 11:15 BBRY 02.10
and
indexConstituents'[DJIA;daterange]
date sym shares divisor
2017.12.05 AAPL 20 2
2017.12.05 GOOG 5 1
2017.12.06 AAPL 10 1.5
2017.12.06 BBRY 100 1
I need a way to run the indexConstituents function as normal to yield a list of constituents over a set of days (as in the second table above), then fetch the data from table 1 for each constituent. Finally, I need to join the data from both tables to yield the below:
data:
date time sym price shares divisor
2017.12.05 09:45 AAPL 101.20 20 2
2017.12.06 09:45 AAPL 101.30 10 1.5
2017.12.05 10:00 GOOG 800.50 5 1
2017.12.06 11:15 BBRY 02.10 200 1
Code for the first two tables:
([] date:2017.12.05,2017.12.06; time:09:45,09:45; sym:`AAPL,`AAPL; price:101.20,102.30)
([] date:2017.12.05,2017.12.05,2017.12.06,2017.12.06; sym:`AAPL,`GOOG,`AAPL,`BBRY; shares:20f,5f,10f,100f; divisor:2f,1f,1.5f,1f)
I think the best approach is to assign the resultant table from indexConstituents'[DJIA;daterange] to a variable, so that we can then pull out the sym column and apply distinct to it.
You can then use that list of syms as your first argument to the quotes.
Finally join the two resultant tables together.
idx:indexConstituents'[DJIA;daterange];
q:quotes\:/:[distinct idx`sym;daterange];
q lj 2!idx
Hope this helps!