How to find duplicates in associated fields in PostgreSQL?

How to find duplicates in associated fields in PostgreSQL? - postgresql

I have table in postgresql that has the following values:
KEY VALNO
1 a1
2 x1
3 x2
4 a3
5 a1
6 x2
7 a4
8 a5
9 x6
4 x7
7 a6
KEY expects unique values, but there are duplicates (4,7). VALNO should have a unique KEY assigned to them, but same VALNO had used multiple KEY (a1 used both 1 & 5, x2 used both 3 & 6).
I tried the following sql to find duplicates, but could not succeed.
select KEY, VALNO from mbs m1
where (select count(*) from mbs m2
where m1.KEY = m2.KEY) > 1
order by KEY
Is there a better way to find same VALNO's have used different KEYS, and same KEY's have used different VALNO's?
ie
Duplicate VALNO
KEY VALNO
1 a1
5 a1
3 x2
6 x2
Duplicate KEY
KEY VALNO
4 x7
7 a6

For VALNO duplicate records, we can use COUNT as an analytic function:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY VALNO) cnt
FROM mbs
)
SELECT "KEY", VALNO
FROM cte
WHERE cnt > 1;
The logic for the KEY duplicate query is almost identical, except that we can use this for the count definition:
COUNT(*) OVER (PARTITION BY "KEY") cnt

Related

Need t-sql view with difficult sort order

I have a sorting issue in a sql-server 2017 view. To simplify the question: I have a table with hierarchical data and has two columns: key and txt. The key column is used for the hierarchical order and has one, two or three positions. The txt column just has random text values. I need to sort the data, but on a combination of both key and txt columns. To be more precise, I need to get from the left view (sorted on key column) to the right view (the sort I need):
key
txt
key
txt
A
de
A
de
A1
al
A1
al
A2
nl
A3
gt
A3
gt
A31
oj
A31
oj
A2
nl
B
pf
B
pf
B1
zf
B4
ar
B2
br
B42
cd
B3
qa
B41
ik
B31
lb
B2
br
B32
bn
B3
qa
B33
kt
B32
bn
B4
ar
B33
kt
B41
ik
B31
lb
B42
cd
B1
zf
So the view should first show the top level (key is one character) and then below that row the txt values alphabetically (key is two characters). But if the key has three characters, the rows must be placed alphabetically under the matching key with two characters. In the example above, row with key A31 must be listed directly under the row with key A3, row with key B42 must be directly below B4 and B41 below B42, etc.
I have tried many things, but I cannot get the rows with the three character keys to appear directly under the proper two character key rows.
This is an example of what I tried:
SELECT *
FROM tbl
ORDER BY CASE LEN(key) WHEN 1 THEN key
WHEN 2 THEN LEFT(key, 1) + '10'
ELSE LEFT(key, 1) + '20'
END, txt
But this places the rows with three character keys at the bottom of the list...
Hope someone can put me in the right direction.

This is a really complicated process because your rules are more complicated than your schema. Here's my attempt, using window functions to group things together and determine which 2-character substring has the lowest txt value, then perform a series of ordering conditionals:
WITH cte AS
(
SELECT [key],
l = LEN([key]),
k1 = LEFT([key],1),
k2 = LEFT([key],2),
txt
FROM dbo.YourTableName
),
cte2 AS
(
SELECT *,
LowestTxt = MIN(CASE WHEN l = 2 THEN txt END) OVER (PARTITION BY k2),
Len2RN = ROW_NUMBER() OVER (PARTITION BY k2
ORDER BY CASE WHEN l = 2 THEN txt ELSE 'zzzzz' END)
FROM cte
)
SELECT [key], txt
FROM cte2
ORDER BY k1,
CASE WHEN l > 1 THEN 1 END,
LowestTxt,
CASE WHEN l = 2 THEN 'aaa' ELSE txt END,
Len2RN;
Example in this working fiddle.

PSQL filter each group of rows

Recently I've faced with pretty rare filtering case in PSQL.
My question is: How to filter redundant elements in each group of the grouped table?
For example: we have a nexp table:
id | group_idx | filter_idx
1 1 x
2 3 z
3 3 x
4 2 x
5 1 x
6 3 x
7 2 x
8 1 z
9 2 z
Firstly, to group rows:
SELECT group_idx FROM table
GROUP BY group_idx;
But how I can filter redundant fields (filter_idx = z) from each group after grouping?
P.S. I can't just write like that because I need to find groups firstly.
SELECT group_idx FROM table
where filter_idx <> z;
Thanks.

Assuming that you want to see all groups at all times, even when you filter out all records of some group:
drop table if exists test cascade;
create table test (id integer, group_idx integer, filter_idx character);
insert into test
(id,group_idx,filter_idx)
values
(1,1,'x'),
(2,3,'z'),
(3,3,'x'),
(4,2,'x'),
(5,1,'x'),
(6,3,'x'),
(7,2,'x'),
(8,1,'z'),
(9,2,'z'),
(0,4,'y');--added an example of a group that would be discarded using WHERE.
Get groups in one query, filter your rows in another, then left join the two.
select groups.group_idx,
string_agg(filtered_rows.filter_idx,',')
from
(select distinct group_idx from test) groups
left join
(select group_idx,filter_idx from test where filter_idx<>'y') filtered_rows
using (group_idx)
group by 1;
-- group_idx | string_agg
-------------+------------
-- 3 | z,x,x
-- 4 |
-- 2 | x,x,z
-- 1 | x,x,z
--(4 rows)

reshaping table based on column values

I was looking at a problem of reshaping a table creating new columns according based on values.
I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns
so I have a table:
df:([]ID:1+til 5;
Group:1 1 2 2 2;
V1:10 + 2 * til 5;
Type_v1:`t1`t2`t1`t1`t2;
V2:3 0N 0N 7 8;
Type_v2:`t2```t3`t3);
ID Group V1 Type_v1 V2 Type_v2
------------------------------
1 1 10 t1 3 t2
2 1 12 t2
3 2 14 t1
4 2 16 t1 7 t3
5 2 18 t2 8 t3
and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3
I did the beginning but I am unable to transform the table and create the new columns.
also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.
df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;
basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
any help and explanations appreciated,

The last piece you're missing is a pivot: https://code.kx.com/q/kb/pivoting-tables/
q)P:exec distinct Type from tr
q)exec P#(Type!Value) by Group:Group from tr
Group| t1 t2 t3
-----| --------
1 | 10 15
2 | 30 18 15
It doesn't quite get you the exact output but pivot is the concept

You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:
https://code.kx.com/q/basics/funsql/
// Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
{[df;n]
// dynamically create cols from 1 to n
cls:`$("v_";"typCol_"),\:/:string 1 + til n;
// functional form of select for each type/value col before joining together
df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
// sum, then pivot
df:0!select sum v by grpCol, typCol from df where typCol <> `;
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
// Type cols seem unnecessary but
// Can be done with another functional select
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
}[df;2]
grpCol t1 typCol_1 t2 typCol_2 t3 typCol_3
1 10 t1 15 t2 0N t3
2 30 t1 18 t2 15 t3
EDIT - More detailed breakdown below:
cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.
,\:/: -> join with each left and each right iterators
https://code.kx.com/q/ref/maps/#each-left-and-each-right
This allows me to join every item on the left ("v_";"typCol_") with every item on the right.
The same could be achieved with cross but you would have to restructure the list with flip and cut
flip n cut `$("v_";"typCol_") cross string 1 + til n
(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
(,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.
https://code.kx.com/q/ref/over/
{?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
// functional select
?[table; where; by; columns]
?[x; (); 0b; `grpCol`v`typCol!`grpCol,y]
This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.
The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values
e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).
There is a useful keyword to help with figuring out functional form -> parse
parse"select Group, Value:V1, Type:Type_v1 from df"
0 ?
1 `df
2 ()
3 0b
4 (`Group`Value`Type)!`Group`V1`Type_v1
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/
It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol
?[table; where; by; columns]
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:
(`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)
It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,
`t1 = (column) `t1
`typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column

DB2 - select multiple key values for common primary key

I have following data
ID KEY Value
1 K1 1
1 K2 2
2 K1 3
2 K2 4
3 K1 5
3 K2 6
I need to do a select from above table data in DB2 to display like following -
ID Key1 Value Key2 Value
1 K1 1 K2 2
2 K1 3 K2 4
3 K1 5 K2 6

You have to join your source table with itself.
One possible solution would be:
select t1.ID, t1.KEY as KEY1, t1.value, t2KEY as KEY2, t2.value
from <tabname> t1,
<tabname> t2
where t1.ID = t2.ID
and t1.KEY='K1'
and t2.KEY='K2'

Hive SubQuery and Group BY

I have two tables
table1:
id
1
2
3
table 2:
id date
1 x1
4 x2
1 x3
3 x4
3 x5
1 x6
3 x5
6 x6
6 x5
3 x6
I want the count of each ids for table 2 that is present in table 1.
Result
id count
1 3
2 0
3 4
I am using this query, but its giving me error:
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2
GROUP BY tab2.id
WHERE tab2.id IN (select id from <mytable1>)
;
Error is:
missing EOF at 'WHERE' near 'di_device_id'

There are two possible issues. Sub queries in the WHERE clause are only supported from Hive 0.13 and up. If you are using such a version, then your problem is just that you have WHERE and GROUP BY the wrong way round:
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2
WHERE tab2.id IN (select id from <mytable1>)
GROUP BY tab2.id
;
If you are using an older version of Hive then you need to use a JOIN:
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2 INNER JOIN <mytable1> tab1 ON (tab2.id = tab1.id)
GROUP BY tab2.id
;

You have two issues :-
Where comes before group by. In SQL syntax you use having to filter after grouping by!
Hive doesn't support all types of nested queries in Where clause. See here: Hive Subqueries
However yours type of sub query will be ok. Try this:-
SELECT tab2.id, count(tab2.id)
FROM <mytable2> tab2
WHERE tab2.id IN (select id from <mytable1>)
GROUP BY tab2.id;
It will do exactly same thing what you meant.
Edit: I Just checked #MattinBit's answer. I didn't intended to duplicate the answer. His answer is more complete!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse