Iterate over current row values in kdb query - kdb

Consider the table:
q)trade
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 800 09:04:01.000
and the list:
q)x: 10000 20000
The following query:
q)select from trade where price < x[first where (x - price) > 100f]
'length
fails as above. How can I pass the current row value of price in each iteration of the search query?
While price[0] in the square brackets above works, that's obviously not what I want. I even tried price[i] but that gives the same error.

Related

How can I count elements satisfying a condition in a group, with PostgresSQL

with this query:
SELECT date_trunc('minute', ts) ts, instrument
FROM test
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
I am grouping rows by minutes but I would like to generate a boolean value that tells me if, in the group, there is at least one row with the timestamp where the seconds are < 10 and at least one row with the timestamp where the seconds are > 50.
In short, something like:
lessThan10 = false
moreThan50 = false
for each row in the one minute group:
if row.ts.seconds < 10 then lessThan10 = true
if row.ts.seconds > 50 then moreThan50 = true
return lessThan10 && moreThan50
What I am trying to achieve is to find out if all the records I aggregate cover the beginning and the end of the minute; it's ok if there are holes here and there, but it's possible the data we capture stops and restarts at second 40 for example and, in that case, I'd like to be able to discard the whole minute.
As the data rate varies quite a lot, I can't check for a minimum number of row. There may be a better solution to achieve this, so I'm open to it as well.
Use EXTRACT() to get the seconds of the min and max values of ts:
SELECT date_trunc('minute', ts) ts, instrument,
EXTRACT(SECOND FROM MIN(ts)) < 10 lessThan10,
EXTRACT(SECOND FROM MAX(ts)) > 50 moreThan50
FROM test
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
See the demo.

PostgreSQL product table and sales table

I have the following problem.
I need all the products in a table and at the same time put the sum of the quantities sold of each product.
I need to see each product in the product table with the total sum of sales, according to the date range.
If there is no sales record in that range, it must be zero.
My query is as follows, but it doesn't work.
Product Table : sto_producto
Product sales movement table: sto_movdet
SELECT
sto_producto.pro_codprod AS cod_product,
sto_producto.pro_desc AS name_product,
sum(sto_movdet.mvd_cant) AS total_sale,
AVG(sto_movdet.mvd_costo) AS cost_product_sale
FROM sto_producto
INNER JOIN sto_movdet ON (sto_producto.pro_codprod = sto_movdet.mvd_codprod)
WHERE mvd_fecha BETWEEN '2020301' and '20200716'
GROUP BY pro_codprod, pro_desc
I expect a result similar to
cod_product name_product total_sale cost_product_sale
0004 mousered 45 $ 2.355
0071 pc laptop 0 $ 1.000

Calculate median sales price with using 3 variables Tableau 10

I would like calculate the median sales price and the median rental price for an apartment in NYC in each of the 5 boroughs, Brooklyn, Bronx Manhattan, Queens and Staten Island. In Tableau the sales and and rentals are groups of ListPrice -- Variables ListPrice is NUMBER(decimal) Type (includes Sales & Rentals, Borough
Any help is appreciated
I tried using Tableau's table calculation feature but that did not work, I tried
WINDOW_MEDIAN(SUM([ListPrice])-1, -1)
ERROR: WINDOW_MEDIAN is being called with (float, integer), did mean
(float,integer,integer)
Data
Type Borough ListPrice
RentalType1 Manhattan $5,000
RentalType2 Bronx $3,000
RentalType2 Brooklyn $3,000
SalesType2 Manhattan $900,000
SalesType1 Brooklyn $100,000
SalesType1 Bronx $500,000
SalesType2 Queens $800,000
SalesType2 Staten Island $400,000
Table calculations takes 3 arguments, Expression, First row of the partition and last row of the partition. In your formula you haven't given last row of the partition.
Run the function for type in each Borough and calculate for each Borough.
So your formula would be:
WINDOW_MEDIAN(SUM(INT([List Price])),FIRST(),LAST())
are you looking to get values below:
Here calculation2 is median value

how to use multiple arguments in kdb where query?

I want to select max elements from a table within next 5, 10, 30 minutes etc.
I suspect this is not possible with multiple elements in the where clause.
Using both normal < and </: is failing. My code/ query below:
`select max price from dat where time</: (09:05:00; 09:10:00; 09:30:00)`
Any ideas what am i doing wrong here?
The idea is to get the max price for each row within next 5, 10, 30... minutes of the time in that row and not just 3 max prices in the entire table.
select max price from dat where time</: time+\:(5 10 30)
This won't work but should give the general idea.
To further clarify, i want to calculate the max price in 5, 10, 30 minute intervals from time[i] of each row of the table. So for each table row max price within x+5, x+10, x+30 minutes where x is the time entry in that row.
You could try something like this:
select c1:max price[where time <09:05:00],c2:max price[where time <09:10:00],c3:max price from dat where time< 09:30:00
You can paramatize this query however you like. So if you have a list of times, l:09:05:00 09:10:00 09:15:00 09:20:00 ... You can create a function using a functional form of the query above to work for different lengths of l, something like:
q)f:{[t]?[dat;enlist (<;`time;max t);0b;(`$"c",/:string til count t)!flip (max;flip (`price;flip (where;((<),/:`time,/:t))))]}
q)f l
You can extend f to take different functions instead of max, work for different tables etc.
This works but takes a lot of time. For 20k records, ~20 seconds, too much!. Any way to make it faster
dat: update tmlst: time+\:mtf*60 from dat;
dat[`pxs]: {[x;y] {[x; ts] raze flip raze {[x;y] select min price from x where time<y}[x] each ts }[x; y`tmlst]} [dat] each dat;
this constructs a step dictionary to map the times to your buckets:
q)-1_select max price by(`s#{((neg w),x)!x,w:(type x)$0W}09:05:00 09:10:00 09:30:00)time from dat
you may also be able to abuse wj:
q)wj[{(prev x;x)}09:05:00 09:10:00 09:30:00;`time;([]time:09:05:00 09:10:00 09:30:00);(delete sym from dat;(max;`price))]
if all your buckets are the same size, it's much easier:
q)select max price by 300 xbar time from dat where time<09:30:00 / 300-second (5-min) buckets

Divide records into groups - quick solution

I need to divide with UPDATE command rows (selected from subselect) in PostgreSQL table into groups, these groups will be identified with integer value in one of its columns. These groups should be with the same size. Source table contains billions of records.
For example I need to divide 213 selected rows into groups, every group should contains 50 records. The result will be:
1 - 50. row => 1
51 - 100. row => 2
101 - 150. row => 3
151 - 200. row => 4
200 - 213. row => 5
There is no problem to do it with some loop (or use PostgreSQL window functions), but I need to do it very efficiently and quickly. I can't use sequence in id because there should be gaps in these ids.
I have an idea to use random integer number generator and set it as default value for a row. But this is not useable when I need to adjust group size.
The query below should display 213 rows with a group-number from 0-4. Just add 1 if you want 1-5
SELECT i, (row_number() OVER () - 1) / 50 AS grp
FROM generate_series(1001,1213) i
ORDER BY i;
create temporary sequence s minvalue 0 start with 0;
select *, nextval('s') / 50 grp
from t;
drop sequence s;
I think it has the potential to be faster than the row_number version #Richard. But the difference could be not relevant depending on the specifics.