As we have except function for lists in kdb to find the elements which are present in one list and not in another, similarly do we have any utility to extract the rows present in one table and not in another based on a column?
Eg: I have two tables:
l:([]c1:`a`b`c`d;c2:10 20 30 40)
r:([]c1:`a`a`a`b`b;c3:100 200 300 400 50)
Since, for column c1 in table l we have row c d which are not present in column c1 of table r.
Do we have any utility in kdb which can be used to get output like below?
c1 c2
-----
c 30
d 40
I got the output using -
select from l where c1 in l[`c1] except r`c1
But, I'm searching for better/optimised solution/utility to get the same output.
I don't think there's anything wrong with your current implementation but you could use drop (aka _) on a keyed table for a more succinct approach:
q)#[1#`c1;r]_1!l
c1| c2
--| --
c | 30
d | 40
This also remains pretty neat when they "key" is more than one column:
l0:([]c0:`x`y`z`w;c1:`a`b`c`d;c2:10 20 30 40)
r0:([]c0:`y`x`x`x`y;c1:`a`a`a`b`b;c3:100 200 300 400 50)
q)#[`c0`c1;r0]_2!l0
c0 c1| c2
-----| --
z c | 30
w d | 40
A more functional form would be this:
{cl:cols[x]inter cols y;x where not(cl#x)in cl#y}[l;r]
c1 c2
-----
c 30
d 40
This should work if you don't know the columns to match on because of cols[x] inter cols[y] at the start which obtains common cols between the two tables. It also works without columns being keyed.
Although in this specific case, the following would be a little bit faster:
l where not l[`c1] in r[`c1]
Related
Why I can't retrieve the first distinct row with just any other expression in the order by, why should the leftmost expression be the same expression I used in DISTINCT ON?
Well, the ORDER BY is needed to keep those rows together that share the same value for the "distinct columns". The database processes them sequentially discarding all subsequent rows from the same set. If the rows weren't sorted, this wouldn't be easily possible.
Assume this set of rows:
c1 | c2
---+----
1 | 100
2 | 10
1 | 200
2 | 15
If you want the c1 to be unique and pick the highest c2 you would need to use
select distinct on (c1) *
from the_table
order by c1, c2 desc;
The order by itself will generate the following result:
c1 | c2
---+----
1 | 200
1 | 100
2 | 15
2 | 10
By processing that result row-by-row the database can now efficiently discard every but the first row for each c1 value by simply checking if that value changes from row to another. If the result wasn't sorted this check would be become far more complicated.
I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?
(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30
The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50
I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.
I have a table that looks liek this
column1 column2 value
a d 10
a e 20
b d 30
b e 40
and I want to get the rank of the value without considering column1 so I use and LOD to first get the total value with: {EXCLUDE [column1]: SUM([value])}
this works and produces
rank
d 40
e 60
d 40
e 60
BUT what I want to do is get the rank. So I'd like
rank
d 2
e 1
d 2
e 1
when I do this RANK( {EXCLUDE [Pct Of Adv Buckets]: SUM([Notional])} )
I get an error "all fields must be aggregate or constants when using table calcualtions. Can you advise how to get teh rank.
Did you tried RANK(SUM({EXCLUDE [Column1]: SUM([Value])}))
I have a different raking, but I think that is that you are looking for.
Trying to 2 values in a column together. The idea is that I get the m1, m2, and m3 values that fit the criteria; area ='000000' , ownership = '50', and code =113 or 114. The values should be 42, 40, and 44 respectively. Until now, I have been doing this in excel but am trying to take Excel out of this process. There are no NULL values involved in this.
Any idea why I am getting this error?
select sum (m1,m2,m3),
from dbo.tablename
where area='000000' and ownership='50' and (code='113' or code='114');
sample data
area ownership code m1 m2 m3
000000 50 113 40 38 42
000000 50 114 2 2 2
desired result
000000 50 113+114 42 40 44
In SQL, SUM(column) is an aggregate function that sums the values across different rows. If you want to add values from a single row, you can do SELECT m1 + m2 + m3 FROM.... You can also add the column values inside the rows, then sum it across rows like SUM(m1 + m2 + m3). I would re-write you query as:
SELECT SUM(m1) sum1, SUM(m2) sum2, SUM(m3) sum3
FROM dbo.tablename
WHERE area='000000' AND ownership='50' AND (code='113' OR code='114');
to get that specific answer as below.
desired result
area | ownership| code | m1 | m2 | m3
000000| 50 | 113+114| 42 | 40 | 44
once you want to see area and ownership this should have this columns on the sql and group by condition.
Like:
select area, ownership, sum(code), sum(m1), sum(m2), sum(m3)
from dbo.tablename
where area='000000' and ownership='50' and (code='113' or code='114')
group by area, ownership;
I have a table like this:
postcode | value | uns
AA | 10 | 51
AB | 20 | 78
AA | 20 | 78
AB | 50 | 51
and I want to get a result like:
AA | 0.5
AB | 2.5
where the new values are the division for the same postcode between the value with uns = 51 and the value with uns = 78.
How can I do that with Postgres? I already checked window functions and partitions but I am not sure how to do it.
If (postcode, uns) is unique, all you need is a self-join:
select postcode, uns51.value / nullif(uns78.value, 0)
from t uns51
join t uns78 using (postcode)
where uns51.uns = 51
and uns78.uns = 78
If the rows with either t.uns = 51 or t.uns = 78 may be missing, you could use a full join instead (with possibly coalesce() to provide default values for missing rows).
pozs' solution is nice and simple, nothing wrong with it. Just adding two alternatives:
1. Correlated subquery
SELECT postcode
, value / (SELECT NULLIF(value, 0) FROM t WHERE postcode = uns51.postcode AND uns = 78)
FROM t uns51
WHERE uns = 51;
For only one or a few rows.
2. Conditional aggregate
SELECT postcode
, min(value) FILTER (WHERE uns = 51)/ NULLIF(min(value) FILTER (WHERE uns = 78), 0)
FROM t
GROUP BY postcode;
May be faster when processing most or all of the table.
Can also deal with duplicates per (postcode, uns), use an aggregate function of your choice to pick the right value from each group. For just one row in each group, min() is just as good as max() or sum().
About the aggregate FILTER:
Aggregate columns with additional (distinct) filters