KDB selecting first row from each group

KDB selecting first row from each group - select

Very silly question... Consider the table t1 below which is sorted by sym.
t1:([]sym:(3#`A),(2#`B),(4#`C);val:10 40 12 50 58 75 22 103 108)
sym val
A 10
A 40
A 12
B 50
B 58
C 75
C 22
C 103
C 108
I want to select the first row corresponding to each sym, like this:
(`sym`val)!(`A`B`C;10j, 50j, 75j)
sym val
A 10
B 50
C 75
There's got to be a one-liner to do this. To get the LAST row for each sym, it would be as simple as select by sym from t1. Any hints?

select first val by sym from t1
Or for multiple columns, you can reverse the table and run your query:
select by sym from reverse t1

You could use fby
q)select from t1 where i=(first;i) fby sym
sym val
-------
A 10
B 50
C 75

Related

Utility like except for tables in kdb

As we have except function for lists in kdb to find the elements which are present in one list and not in another, similarly do we have any utility to extract the rows present in one table and not in another based on a column?
Eg: I have two tables:
l:([]c1:`a`b`c`d;c2:10 20 30 40)
r:([]c1:`a`a`a`b`b;c3:100 200 300 400 50)
Since, for column c1 in table l we have row c d which are not present in column c1 of table r.
Do we have any utility in kdb which can be used to get output like below?
c1 c2
-----
c 30
d 40
I got the output using -
select from l where c1 in l[`c1] except r`c1
But, I'm searching for better/optimised solution/utility to get the same output.

I don't think there's anything wrong with your current implementation but you could use drop (aka _) on a keyed table for a more succinct approach:
q)#[1#`c1;r]_1!l
c1| c2
--| --
c | 30
d | 40
This also remains pretty neat when they "key" is more than one column:
l0:([]c0:`x`y`z`w;c1:`a`b`c`d;c2:10 20 30 40)
r0:([]c0:`y`x`x`x`y;c1:`a`a`a`b`b;c3:100 200 300 400 50)
q)#[`c0`c1;r0]_2!l0
c0 c1| c2
-----| --
z c | 30
w d | 40

A more functional form would be this:
{cl:cols[x]inter cols y;x where not(cl#x)in cl#y}[l;r]
c1 c2
-----
c 30
d 40
This should work if you don't know the columns to match on because of cols[x] inter cols[y] at the start which obtains common cols between the two tables. It also works without columns being keyed.
Although in this specific case, the following would be a little bit faster:
l where not l[`c1] in r[`c1]

How can I delete a column by index from a kdb table?

For example how would you delete the first column from the following table:
q)t: ([] a: (2018.09.25; 2018.09.25; 2018.09.25); b: `ABC`XYZ`BAC ; c: (10 20 30))
q)t
a b c
-----------------
2018.09.25 ABC 10
2018.09.25 XYZ 20
2018.09.25 BAC 30
The expected result:
b c
---------
ABC 10
XYZ 20
BAC 30
It is possible to use delete a from t but I would like to be able to delete without knowing the exact column name beforehand.

You could use a functional delete:
q){[t;index]![t;();0b;enlist cols[t]index]}[t;0]
b c
------
ABC 10
XYZ 20
BAC 30
https://code.kx.com/q/ref/funsql/#delete
Use parse in order to see what the q-sql statement looks like in functional form:
q)parse"delete a from t"
!
`t
()
0b
,,`a

You could use
{(_/[cols x;desc y])#x}[t;0 2]
This takes in the columns of your table, takes the indices you want to drop and uses a drop scan to drop these columns. If you wanted to remove only one index, you'd have to enlist, like so:
{(_/[cols x;desc y])#x}[t;enlist 0]

If your table is not keyed then you can do simple deletion from dictionary:
q) f:{[t;ind] enlist[cols[t] ind]_t}
q) f[t;0]
b c
------
ABC 10
XYZ 20
BAC 30

Using flip and drop :
q)flip 1_flip 0!t
b c
------
ABC 10
XYZ 20
BAC 30

Getting a "The sum function requires 1 argument(s)." error

Trying to 2 values in a column together. The idea is that I get the m1, m2, and m3 values that fit the criteria; area ='000000' , ownership = '50', and code =113 or 114. The values should be 42, 40, and 44 respectively. Until now, I have been doing this in excel but am trying to take Excel out of this process. There are no NULL values involved in this.
Any idea why I am getting this error?
select sum (m1,m2,m3),
from dbo.tablename
where area='000000' and ownership='50' and (code='113' or code='114');
sample data
area ownership code m1 m2 m3
000000 50 113 40 38 42
000000 50 114 2 2 2
desired result
000000 50 113+114 42 40 44

In SQL, SUM(column) is an aggregate function that sums the values across different rows. If you want to add values from a single row, you can do SELECT m1 + m2 + m3 FROM.... You can also add the column values inside the rows, then sum it across rows like SUM(m1 + m2 + m3). I would re-write you query as:
SELECT SUM(m1) sum1, SUM(m2) sum2, SUM(m3) sum3
FROM dbo.tablename
WHERE area='000000' AND ownership='50' AND (code='113' OR code='114');

to get that specific answer as below.
desired result
area | ownership| code | m1 | m2 | m3
000000| 50 | 113+114| 42 | 40 | 44
once you want to see area and ownership this should have this columns on the sql and group by condition.
Like:
select area, ownership, sum(code), sum(m1), sum(m2), sum(m3)
from dbo.tablename
where area='000000' and ownership='50' and (code='113' or code='114')
group by area, ownership;

kdb+: group by and sum over multiple columns

Consider the following data:
table:
time colA colB colC
-----------------------------------
11:30:04.194 31 250 a
11:30:04.441 31 280 a
11:30:14.761 31.6 100 a
11:30:21.324 34 100 a
11:30:38.991 32 100 b
11:31:20.968 32 100 b
11:31:56.922 32.2 1000 b
11:31:57.035 32.6 5000 c
11:32:05.810 33 100 c
11:32:05.810 33 100 a
11:32:14.461 32 300 b
Now how can I sum colB whenever colC is the same, without losing the time order.
So the output would be:
first time avgA sumB colC
-----------------------------------
11:30:04.194 31.2 730 a
11:30:38.991 32.07 1200 b
11:31:57.035 32.8 5100 c
11:32:05.810 33 100 a
11:32:14.461 32 300 b
What I have so far:
select by time from (select first time, avg colA, sum colB by colC, time from table)
But the output is not grouped by colC. How should the query look like?

How about this?
get select first time, avg colA, sum colB, first colC by sums colC<>prev colC from table

A slightly different way to achieve this using differ :
value select first time, avg colA, sum colB , first colC by g:(sums differ colC) from table

How to sum multiple elements from single record

I have table trade:([]time:`time$(); sym:`symbol$(); price:`float$(); size:`long$())
with e.g. 1000 records, with e.g. 10 unique syms. I want to sum the first 4 prices for each sym.
My code looks like:
priceTable: select price by sym from trade;
amountTable: select count price by sym from trade;
amountTable: `sym`amount xcol amountTable;
resultTable: amountTable ij priceTable;
So my new table looks like: resultTable
sym | amount price
-------| --------------------------------------------------------------
instr0 | 106 179.2208 153.7646 155.2658 143.8163 107.9041 195.521 ..
The result of command: res: select sum price from resultTable where i = 1:
price
..
----------------------------------
14.71512 153.2244 154.1642 196.5744
Now, when I want to sum elements I receive: sum res
price| 14.71512 153.2244 154.1642 196.5744 170.6052 61.26522 45.70606
46.9057..
When I want to count elements in res: count res
1
I assume that res is a single record with many values, how can I sum all of those values, or how can I sum first for?

You can use "each" to run the sum on each row:
select sum each price from res
Or if you want to run on resoultTable:
select sum each price from resoultTable
To sum the first four prices for each row, use a dyadic each-right:
select sum each 4#/:price from resoultTable
Or you could do all of this very easily, in one step:
select COUNT:count i, SUM:sum price, SUM4:sum 4#price by sym from trade

q)trade:([]time:10?.z.d; sym:10#`a`b`c; price:100.+til 10; size:10+til 10)
One caveat with take (#) operator is, if the elements in the list are lesser than the take count , it treats the list as circular and start retruning the repetative results. E.g. check out the 4th price for symbol b and c.
q)select 4#price by sym from trade
sym| price
---| ---------------
a | 100 103 106 109
b | 101 104 107 101 //101 - 2 times
c | 102 105 108 102 //102 - 2 times
Using sublist can ensure that it the elemnts are lesser than passed count argument , it will just return the smaller list.
q)select sublist[4;price] by sym from trade
sym| price
---| ----------------
a | 100 103 106 109f
b | 101 104 107f
c | 102 105 108f

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

KDB selecting first row from each group - select

select first val by sym from t1 Or for multiple columns, you can reverse the table and run your query: select by sym from reverse t1

You could use fby q)select from t1 where i=(first;i) fby sym sym val ------- A 10 B 50 C 75

Related

Utility like except for tables in kdb

How can I delete a column by index from a kdb table?

Getting a "The sum function requires 1 argument(s)." error

kdb+: group by and sum over multiple columns

How to sum multiple elements from single record

Categories

Resources