KDB get only the rows present only in one table

KDB get only the rows present only in one table - kdb

I have two tables table1 and table2. Both has 4 columns with same column names. table1 has 50 rows and table2 has 100 rows. How can I get only those rows from table2, which are not there in table1. I tried performing left join, but I am not able to do that, since we can't do left join using all columns.

Since tables are lists of dictionaries, you could use the except keyword to exclude all rows from table2 which are found in table1.
For example:
q)table1:([]a:til 3;b:3#.Q.a;c:3#.Q.A)
q)table1
a b c
-----
0 a A
1 b B
2 c C
q)table2:([]a:til 6;b:6#.Q.a;c:6#.Q.A)
q)table2
a b c
-----
0 a A
1 b B
2 c C
3 d D
4 e E
5 f F
q)table2 except table1
a b c
-----
3 d D
4 e E
5 f F

Related

How to query table and sum up certain columns by criteria, but not others?

From a starting table, let's say:
A
B
C
1
1
99
2
2
88
3
3
77
I'm trying to write a query that would result in a table with a different value in column C based on the criteria that when A has value 2, the value for C should be the existing value + the value from C where A is 1. Here's the result:
A
B
C
1
1
99
2
2
187
3
3
77
Unsure if a grouping makes sense here, especially since there might be multiple similar criteria. The closes query I could think of would be
SELECT A, B, C+(SELECT C FROM table1 WHERE A=1 LIMIT 1) FROM table1 WHERE A=2;
but this isn't valid SQL, since subqueries can't be used like this. Any suggestions are welcome, even if they involve somehow altering the structure of the original table.

consider below approach (tested in BigQuery)
select a, b, c +
case a
when 2 then sum(if(a = 1, c, 0)) over()
else 0
end c
from your_table
if applied to sample data in your question - output is

SELECT
A,
B,
CASE
WHEN A=2 THEN C + (SELECT C FROM table WHERE A = 1)
ELSE C
END AS C
FROM
table;

Finding set of rows in table based on matching rows from another table

I know the topic is a bit vague at best, but cannot find a way to describe my problem better...
An example, I have the following two tables:
TableA
IdA
Code
Value
123
A
1
123
B
2
123
C
3
456
A
4
456
F
6
456
E
7
...
TableB
IdB
Code
Value
X
A
1
X
B
2
X
C
3
Y
G
2
Y
D
8
Y
C
3
Z
A
1
Z
B
2
Z
C
3
Z
D
5
...
A set of records for a given IdA in TableA correlates to an equivalent set of records in TableB having a specific IdB.
For instance, for IdA = 123 in TableA, I have exactly three rows with certain codes and values, this would "map" to rows with IdB = X in TableB because it has the same combination of Codes and Values and the same number of rows. Note that it would not map to IdB = Z in TableB, because it has an additional row for Code D which IdA = 123 doesn't have in TableA.
Given only IdA, how to best write a query to find IdB?
If the codes and values were known, I could have done something similar to this:
SELECT b.IdB FROM TableB b
WHERE
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'A' AND x.Value = '1') AND
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'B' AND x.Value = '2') AND
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'C' AND x.Value = '3') AND
(SELECT COUNT(*) FROM TableB x WHERE x.IdB = b.IdB) = 3
But now I'm only given a value for IdA, so I need to look up values from TableA and combine that in the query for TableB. Any clever ideas on how to tackle this?

This is a question of Relational Division Without Remainder.
There are many solutions, here is one:
Take TableB and left join TableA to it
But calculate a total over the whole set of values from A
Group by IdB
Filter so we only have rows where the total count is equal to the number of matches to A (because COUNT(IdA) only counts non-nulls) and the total count must also be the same as the total number of rows that we want to match to.
DECLARE #idA int = 123;
SELECT
b.IdB
FROM TableB b
LEFT JOIN (
SELECT *,
total = COUNT(*) OVER ()
FROM TableA a
WHERE a.IdA = #idA
) a ON b.Code = a.Code AND b.Value = a.Value
GROUP BY
b.IdB
HAVING COUNT(*) = COUNT(a.IdA)
AND COUNT(*) = MIN(a.total);
db<>fiddle

Calculation in a table with a value from another table

Assumed we have two tables:
Table A:
a b c
x 1 null
x 2 null
y 3 null
Table B:
a b
x 5
y 10
I want to update Table A by multiplication of TableA.b with TableB.b and writing it into TableA.c. The value of TableB should be selected by the condition TableA.a = TableB.a. Thus my updated TableA should look like this:
Table A:
a b c
x 1 5
x 2 10
y 3 30
I thought to do a join of both tables before, but im not sure. What do you think is the easiest and best solution?

In Postgres, you can use the update ... set ... from ... where syntax.
Consider:
update tablea ta
set c = ta.b * tb.b
from tableb tb
where tb.a = ta.a

How to merge datasets based on an identifier which is non-unique in both datasets

When in Stata two data sets shall be merged, based on one variable that is non-unique in either of the data sets, merge x:x does not appear to be a useful tool. What strategy would yield the desired results?
Stylized example:
Dataset1
AssetManager | Bankcode
A 1
B 2
B 3
C 3
Dataset2
Bankcode | t
1 t1
1 t2
2 t1
2 t2
3 t1
3 t2
Aim:
AssetManager | Bankcode | t
A 1 t1
A 1 t2
B 2 t1
B 2 t2
B 3 t1
B 3 t2
C 3 t1
C 3 t2
Intuition:
Some asset managers can by held by multiple banks, while some banks also own multiple asset managers.

The use of merge m:m is discouraged (read the corresponding entries in the Stata manuals), and many people support its elimination. Try joinby:
clear
set more off
input ///
str1 AssetManager Bankcode
A 1
B 2
B 3
C 3
end
tempfile first
save "`first'"
clear
input ///
Bankcode str2 t
1 t1
1 t2
2 t1
2 t2
3 t1
3 t2
end
joinby Bankcode using "`first'"
sort AssetManager Bankcode t
order AssetManager Bankcode
list, sepby(AssetManager)

delete duplicate rows matching with other table

I have two tables
Table A Table B
-------- ---------
a b c a b c
a b c a b c
a b c a b c
e f g a b c
h i j e f g
k l m k l m
k l m
x y z
s t u
a b c
a b c
Now i want to remove rows in Table B matching on column 1, 2 and 3 with table A where the count of each duplicate row in Table B should be less than or equal to table A.
So the output should be
Table A Table B
-------- ---------
a b c a b c
a b c a b c
a b c a b c
e f g e f g
h i j k l m
k l m x y z
s t u
I have tried using inner join and intersect but failed to get the desired result.

Try:
DELETE FROM tableB
WHERE ctid IN (
SELECT BB.ctid
FROM (
SELECT a, b, c, count(*) cnt
FROM tablea
GROUP BY a, b, c
) AA
JOIN (
SELECT ctid,
a, b, c,
row_number() over (partition by a,b,c) cnt
FROM tableb
) BB
ON AA.a = BB.a
AND AA.b = BB.b
AND AA.c = BB.c
AND AA.cnt < BB.cnt
)
demo: http://sqlfiddle.com/#!12/73e99/1

I think if table isn't big the simply way is to delete all rows from TableB which exist in TableA and then insert TableA into TableB. Another ways IMHO are required at least a primary key in TableB.
DELETE FROM TableB
WHERE EXISTS(SELECT * FROM TableA
WHERE C1=TableA.C1
AND C2=TableA.C2
AND C3=TableA.C3) ;
INSERT INTO TableB SELECT * FROM TableA;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

KDB get only the rows present only in one table - kdb

Related

How to query table and sum up certain columns by criteria, but not others?

Finding set of rows in table based on matching rows from another table

Calculation in a table with a value from another table

How to merge datasets based on an identifier which is non-unique in both datasets

delete duplicate rows matching with other table

Categories

Resources