DB2 - select multiple key values for common primary key - db2

I have following data
ID KEY Value
1 K1 1
1 K2 2
2 K1 3
2 K2 4
3 K1 5
3 K2 6
I need to do a select from above table data in DB2 to display like following -
ID Key1 Value Key2 Value
1 K1 1 K2 2
2 K1 3 K2 4
3 K1 5 K2 6

You have to join your source table with itself.
One possible solution would be:
select t1.ID, t1.KEY as KEY1, t1.value, t2KEY as KEY2, t2.value
from <tabname> t1,
<tabname> t2
where t1.ID = t2.ID
and t1.KEY='K1'
and t2.KEY='K2'

Related

How to find duplicates in associated fields in PostgreSQL?

I have table in postgresql that has the following values:
KEY VALNO
1 a1
2 x1
3 x2
4 a3
5 a1
6 x2
7 a4
8 a5
9 x6
4 x7
7 a6
KEY expects unique values, but there are duplicates (4,7). VALNO should have a unique KEY assigned to them, but same VALNO had used multiple KEY (a1 used both 1 & 5, x2 used both 3 & 6).
I tried the following sql to find duplicates, but could not succeed.
select KEY, VALNO from mbs m1
where (select count(*) from mbs m2
where m1.KEY = m2.KEY) > 1
order by KEY
Is there a better way to find same VALNO's have used different KEYS, and same KEY's have used different VALNO's?
ie
Duplicate VALNO
KEY VALNO
1 a1
5 a1
3 x2
6 x2
Duplicate KEY
KEY VALNO
4 x7
7 a6
For VALNO duplicate records, we can use COUNT as an analytic function:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY VALNO) cnt
FROM mbs
)
SELECT "KEY", VALNO
FROM cte
WHERE cnt > 1;
The logic for the KEY duplicate query is almost identical, except that we can use this for the count definition:
COUNT(*) OVER (PARTITION BY "KEY") cnt

Subsetting records that contain multiple values in one column

In my postgres table, I have two columns of interest: id and name - my goal is to only keep records where id has more than one value in name. In other words, would like to keep all records of ids that have multiple values and where at least one of those values is B
UPDATE: I have tried adding WHERE EXISTS to the queries below but this does not work
The sample data would look like this:
> test
id name
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 2 B
9 1 B
10 2 B
and the output would look like this:
> output
id name
1 1 A
2 2 A
8 2 B
9 1 B
10 2 B
How would one write a query to select only these kinds records?
Based on your description you would seem to want:
select id, name
from (select t.*, min(name) over (partition by id) as min_name,
max(name) over (partition by id) as max_name
from t
) t
where min_name < max_name;
This can be done using EXISTS:
select id, name
from test t1
where exists (select *
from test t2
where t1.id = t2.id
and t1.name <> t2.name) -- this will select those with multiple names for the id
and exists (select *
from test t3
where t1.id = t3.id
and t3.name = 'B') -- this will select those with at least one b for that id
Those records where for their id more than one name shines up, right?
This could be formulated in "SQL" as follows:
select * from table t1
where id in (
select id
from table t2
group by id
having count(name) > 1)

How to merge datasets based on an identifier which is non-unique in both datasets

When in Stata two data sets shall be merged, based on one variable that is non-unique in either of the data sets, merge x:x does not appear to be a useful tool. What strategy would yield the desired results?
Stylized example:
Dataset1
AssetManager | Bankcode
A 1
B 2
B 3
C 3
Dataset2
Bankcode | t
1 t1
1 t2
2 t1
2 t2
3 t1
3 t2
Aim:
AssetManager | Bankcode | t
A 1 t1
A 1 t2
B 2 t1
B 2 t2
B 3 t1
B 3 t2
C 3 t1
C 3 t2
Intuition:
Some asset managers can by held by multiple banks, while some banks also own multiple asset managers.
The use of merge m:m is discouraged (read the corresponding entries in the Stata manuals), and many people support its elimination. Try joinby:
clear
set more off
input ///
str1 AssetManager Bankcode
A 1
B 2
B 3
C 3
end
tempfile first
save "`first'"
clear
input ///
Bankcode str2 t
1 t1
1 t2
2 t1
2 t2
3 t1
3 t2
end
joinby Bankcode using "`first'"
sort AssetManager Bankcode t
order AssetManager Bankcode
list, sepby(AssetManager)

Hive SubQuery and Group BY

I have two tables
table1:
id
1
2
3
table 2:
id date
1 x1
4 x2
1 x3
3 x4
3 x5
1 x6
3 x5
6 x6
6 x5
3 x6
I want the count of each ids for table 2 that is present in table 1.
Result
id count
1 3
2 0
3 4
I am using this query, but its giving me error:
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2
GROUP BY tab2.id
WHERE tab2.id IN (select id from <mytable1>)
;
Error is:
missing EOF at 'WHERE' near 'di_device_id'
There are two possible issues. Sub queries in the WHERE clause are only supported from Hive 0.13 and up. If you are using such a version, then your problem is just that you have WHERE and GROUP BY the wrong way round:
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2
WHERE tab2.id IN (select id from <mytable1>)
GROUP BY tab2.id
;
If you are using an older version of Hive then you need to use a JOIN:
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2 INNER JOIN <mytable1> tab1 ON (tab2.id = tab1.id)
GROUP BY tab2.id
;
You have two issues :-
Where comes before group by. In SQL syntax you use having to filter after grouping by!
Hive doesn't support all types of nested queries in Where clause. See here: Hive Subqueries
However yours type of sub query will be ok. Try this:-
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2
WHERE tab2.id IN (select id from <mytable1>)
GROUP BY tab2.id;
It will do exactly same thing what you meant.
Edit: I Just checked #MattinBit's answer. I didn't intended to duplicate the answer. His answer is more complete!

how to join two tables without repetation or the cells from second table in postgresql using PLSQL

When I try to join the below two table
I am not able to get the output I want by the join.
I tried using join but it didn't work let me know if its possible with plsql
Table 1:
col1 col2
1 a
1 b
1 c
2 a
2 b
3 a
table 2:
col1 col2
1 x
1 y
2 x
2 y
3 x
3 y
The output must be:
col1 col2 col3
1 a x
1 b y
1 c
2 a x
2 b y
3 a x
3 y
If use the join I am not able to get the same output as above.
The output I am getting is
1 a x
1 a y
1 b x
1 b y
1 c x
1 c y
2 a x
.....
.....
3 a x
3 a y
What you are searching is called a FULL OUTER JOIN. The result of this join contains elements from both input-tables, matching records get combined.
You can find more information here: https://stackoverflow.com/questions/4796872/full-outer-join-in-mysql
Using Window functions, specifically ROW_NUMBER() and partitioning by the Col1 in both tables, we can get a partitioned row_number that can be used as part of the join.
In other words, it seems to me that the order that the records are in is crucial for the join and result set you are desiring. Furthermore, using #Benvorth's suggestion of a FULL OUTER JOIN to achieve the NULLs in both direction.. I believe this might work:
SELECT
COALESCE(t1.col1,t2.col1) as col1,
t1.col2,
t2.col2
FROM
(SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) as col1_row_number FROM table1) t1
FULL OUTER JOIN
(SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) as col1_row_number FROM table2) t2 ON
t1.col1 = t2.col1 AND
t1.col1_row_number = t2.col1_row_number
That ROW_NUMBER() OVER (PARTITION BY col1, ORDER BY col2 ASC) bit will create row number for each record. The row_number will restart back at 1 for each new col1 value encountered. You can think of it like a RANK for each distinct Col1 value based on Col2's value. Table1's output from the subquery SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 ASC) as col1_row_number FROM table1 will look like:
Table 1:
col1 col2 col1_row_number
1 a 1
1 b 2
1 c 3
2 a 1
2 b 2
3 a 1
So we do that with both tables, then we use that row number as part of the join along with col1.
A sqlfiddle showing this matching your desired result from the question