How to sum multiple elements from single record - kdb

I have table trade:([]time:`time$(); sym:`symbol$(); price:`float$(); size:`long$())
with e.g. 1000 records, with e.g. 10 unique syms. I want to sum the first 4 prices for each sym.
My code looks like:
priceTable: select price by sym from trade;
amountTable: select count price by sym from trade;
amountTable: `sym`amount xcol amountTable;
resultTable: amountTable ij priceTable;
So my new table looks like: resultTable
sym | amount price
-------| --------------------------------------------------------------
instr0 | 106 179.2208 153.7646 155.2658 143.8163 107.9041 195.521 ..
The result of command: res: select sum price from resultTable where i = 1:
price
..
----------------------------------
14.71512 153.2244 154.1642 196.5744
Now, when I want to sum elements I receive: sum res
price| 14.71512 153.2244 154.1642 196.5744 170.6052 61.26522 45.70606
46.9057..
When I want to count elements in res: count res
1
I assume that res is a single record with many values, how can I sum all of those values, or how can I sum first for?

You can use "each" to run the sum on each row:
select sum each price from res
Or if you want to run on resoultTable:
select sum each price from resoultTable
To sum the first four prices for each row, use a dyadic each-right:
select sum each 4#/:price from resoultTable
Or you could do all of this very easily, in one step:
select COUNT:count i, SUM:sum price, SUM4:sum 4#price by sym from trade

q)trade:([]time:10?.z.d; sym:10#`a`b`c; price:100.+til 10; size:10+til 10)
One caveat with take (#) operator is, if the elements in the list are lesser than the take count , it treats the list as circular and start retruning the repetative results. E.g. check out the 4th price for symbol b and c.
q)select 4#price by sym from trade
sym| price
---| ---------------
a | 100 103 106 109
b | 101 104 107 101 //101 - 2 times
c | 102 105 108 102 //102 - 2 times
Using sublist can ensure that it the elemnts are lesser than passed count argument , it will just return the smaller list.
q)select sublist[4;price] by sym from trade
sym| price
---| ----------------
a | 100 103 106 109f
b | 101 104 107f
c | 102 105 108f

Related

Getting a "The sum function requires 1 argument(s)." error

Trying to 2 values in a column together. The idea is that I get the m1, m2, and m3 values that fit the criteria; area ='000000' , ownership = '50', and code =113 or 114. The values should be 42, 40, and 44 respectively. Until now, I have been doing this in excel but am trying to take Excel out of this process. There are no NULL values involved in this.
Any idea why I am getting this error?
select sum (m1,m2,m3),
from dbo.tablename
where area='000000' and ownership='50' and (code='113' or code='114');
sample data
area ownership code m1 m2 m3
000000 50 113 40 38 42
000000 50 114 2 2 2
desired result
000000 50 113+114 42 40 44
In SQL, SUM(column) is an aggregate function that sums the values across different rows. If you want to add values from a single row, you can do SELECT m1 + m2 + m3 FROM.... You can also add the column values inside the rows, then sum it across rows like SUM(m1 + m2 + m3). I would re-write you query as:
SELECT SUM(m1) sum1, SUM(m2) sum2, SUM(m3) sum3
FROM dbo.tablename
WHERE area='000000' AND ownership='50' AND (code='113' OR code='114');
to get that specific answer as below.
desired result
area | ownership| code | m1 | m2 | m3
000000| 50 | 113+114| 42 | 40 | 44
once you want to see area and ownership this should have this columns on the sql and group by condition.
Like:
select area, ownership, sum(code), sum(m1), sum(m2), sum(m3)
from dbo.tablename
where area='000000' and ownership='50' and (code='113' or code='114')
group by area, ownership;

How to divide two values from the same column but at different rows

I have a table like this:
postcode | value | uns
AA | 10 | 51
AB | 20 | 78
AA | 20 | 78
AB | 50 | 51
and I want to get a result like:
AA | 0.5
AB | 2.5
where the new values are the division for the same postcode between the value with uns = 51 and the value with uns = 78.
How can I do that with Postgres? I already checked window functions and partitions but I am not sure how to do it.
If (postcode, uns) is unique, all you need is a self-join:
select postcode, uns51.value / nullif(uns78.value, 0)
from t uns51
join t uns78 using (postcode)
where uns51.uns = 51
and uns78.uns = 78
If the rows with either t.uns = 51 or t.uns = 78 may be missing, you could use a full join instead (with possibly coalesce() to provide default values for missing rows).
pozs' solution is nice and simple, nothing wrong with it. Just adding two alternatives:
1. Correlated subquery
SELECT postcode
, value / (SELECT NULLIF(value, 0) FROM t WHERE postcode = uns51.postcode AND uns = 78)
FROM t uns51
WHERE uns = 51;
For only one or a few rows.
2. Conditional aggregate
SELECT postcode
, min(value) FILTER (WHERE uns = 51)/ NULLIF(min(value) FILTER (WHERE uns = 78), 0)
FROM t
GROUP BY postcode;
May be faster when processing most or all of the table.
Can also deal with duplicates per (postcode, uns), use an aggregate function of your choice to pick the right value from each group. For just one row in each group, min() is just as good as max() or sum().
About the aggregate FILTER:
Aggregate columns with additional (distinct) filters

KDB selecting first row from each group

Very silly question... Consider the table t1 below which is sorted by sym.
t1:([]sym:(3#`A),(2#`B),(4#`C);val:10 40 12 50 58 75 22 103 108)
sym val
A 10
A 40
A 12
B 50
B 58
C 75
C 22
C 103
C 108
I want to select the first row corresponding to each sym, like this:
(`sym`val)!(`A`B`C;10j, 50j, 75j)
sym val
A 10
B 50
C 75
There's got to be a one-liner to do this. To get the LAST row for each sym, it would be as simple as select by sym from t1. Any hints?
select first val by sym from t1
Or for multiple columns, you can reverse the table and run your query:
select by sym from reverse t1
You could use fby
q)select from t1 where i=(first;i) fby sym
sym val
-------
A 10
B 50
C 75

Filter rows based on two fields, where one of them contains a selection criterion

Given the following table
group | weight | category_id | category_name_plus
1 10 100 Ab
1 20 101 Bcd
1 30 100 Efghij
2 10 101 Bcd
2 20 101 Cdef
2 30 100 Defgh
2 40 100 Ab
3 10 102 Fghijkl
3 20 101 Ab
The "weight" is unique for each group and is also an indicator for the order of records inside the group.
What I want is to retrieve one record per group filtered by category_id, but only the record having the highest "weight" inside its "group".
Example for filtering by category_id = 100:
group | weight | category_id | category_name_plus
1 30 100 Efghij
2 40 100 Ab
Example for filtering by category_id = 101:
group | weight | category_id | category_name_plus
1 20 101 Bcd
2 20 101 Cdef
3 20 101 Ab
How can I select just these rows?
I tried fiddling with UNIQUE, MAX(category_id) etc. but I'm still unable to get the correct results. The main problem for me is to get the category_name_plus value here.
I am working with PostgreSQL 9.4(beta 3), because I also need various other niceties like "WITH ORDINALITY" etc.
The rank window function should do the trick:
SELECT "group", weight, category_id, category_name_plus
FROM (SELECT "group", weight, category_id, category_name_plus,
RANK() OVER (PARTITION BY "group"
ORDER BY weight DESC) AS rk
FROM my_table) t
WHERE rk = 1 AND category_id = 101
Note:
"group" is a reserved word in SQL, so it has to be surrounded by quotes in order to be used as a column name. It would probably be better, though, to replace it with a non-reserved word, such as "group_id".
Try something like:
SELECT DISTINCT ON (category_id) *
from your_table
order by category_id, weight desc

Difference between rows in KDB/Q

I'm new to KDB/Q and have a question around getting the difference between two (not necessarily adjacent) rows.
I have only one table, which looks like the below:
q)tickers:`ibm`bac`dis`gs`ibm`gs`dis`bac
q)pxs:100 50 30 250 110 240 45 48
q)dates:2013.05.01 2013.01.05 2013.02.03 2013.02.11 2013.06.17 2013.06.21 2013.04.24 2013.01.06
q)trades:([tickers;dates];pxs)
q)trades
tickers dates | pxs
------------------| ---
ibm 2013.05.01| 100
bac 2013.01.05| 50
dis 2013.02.03| 30
gs 2013.02.11| 250
ibm 2013.06.17| 110
gs 2013.06.21| 240
dis 2013.04.24| 45
bac 2013.01.06| 48
I would like to be able to have a either another column in the table that stores the difference between the current and the previous price, or another structure similar in structure. The key question that the resulting needs to answer is "by how much did the stock change compared to the previous time a price was recorded?"
So far I've tried something along the lines of:
select tickers, dates, pxs - pxs(dates bin (exec dates from trades where tickers = trades.tickers)) from trades
which doesn't really work (at all). Definitely due to trying to do SQL-like queries and having a row-oriented mindset.
Please find below an exemple of the sought after answer:
q)trades: do magic with trades
q)trades
tickers dates | pxs | delta
------------------| --- | -----
ibm 2013.05.01| 100 | 0
bac 2013.01.05| 50 | 0
dis 2013.02.03| 30 | 0
gs 2013.02.11| 250 | 0
ibm 2013.06.17| 110 | 10
gs 2013.06.21| 240 | -10
dis 2013.04.24| 45 | 15
bac 2013.01.06| 48 | -2
Thanks for your help,
Dan
q)update delta:{0,1_deltas x}pxs by tickers from trades
tickers dates | pxs delta
------------------| ---------
ibm 2013.05.01| 100 0
bac 2013.01.05| 50 0
dis 2013.02.03| 30 0
gs 2013.02.11| 250 0
ibm 2013.06.17| 110 10
gs 2013.06.21| 240 -10
dis 2013.04.24| 45 15
bac 2013.01.06| 48 -2
if you do:
select pxs by dates,tickers from table
you will have a complex column (pxs) which is a list of prices for the particular date and ticker. You can then apply deltas:
select deltas pxs by dates,tickers from table
Which will give you the running difference. The first value is the original pxs though so you'll need to update the first one to 0.
EDIT
Just re-read and having looked at your result, you'll need to join back to your original trade table
update dates, pxs, delta:(0N,(-1_ pxs) - 1_ pxs) by tickers from trades
Please find how it works:
select pxs by tickets from trades
creates table which rows contains: ticket and list pxs.
So in every row we have a list:
tickers| pxs
-------| -------
bac | 50 48
dis | 30 45
gs | 250 240
ibm | 100 110
now we have to apply function which will calculate delta. Best function mentioned above: deltas, but my version is about the same.
if we select - then we will have table with tickers|list of pxs|list of deltas, but is we use update .. by, then it ungroup groupped values.
You can get the same results using the prev function. One thing worth highlighting that prev automatically adds the null (0N) as the first element. This is important as we don't have the previous information available, however, adding a 0 as the first element suggests that there has not been any change; though it depends on how you want to handle the first record.
q)update delta:pxs-prev[pxs] by tickers from trades
tickers dates | pxs delta
------------------| ---------
ibm 2013.05.01| 100
bac 2013.01.05| 50
dis 2013.02.03| 30
gs 2013.02.11| 250
ibm 2013.06.17| 110 10
gs 2013.06.21| 240 -10
dis 2013.04.24| 45 15
bac 2013.01.06| 48 -2
using deltas to get the same results (0N instead of 0)
q)update delta:{0N,1_deltas x}pxs by tickers from trades