Removing rows with duplicate secondary values - postgresql

This one is probably a softball question for any DBA, but here's my challenge. I have a table that looks like this:
id parent_id active
--- --------- -------
1 5 y
2 6 y
3 6 y
4 6 y
5 7 y
6 8 y
The way the system I am working on operates, it should only have one active row per parent. Thus, it'd be ok if ID #2 and #3 were active = 'n'.
I need to run a query that finds all rows that have duplicate parent_ids who are active and flip all but the highest ID to active = 'y'.
Can this be done in a single query, or do I have to write a script for it? (Using Postgresql, btw)

ANSI style:
update table set
active = 'n'
where
id <> (select max(id) from table t1 where t1.parent_id = table.parent_id)
Postgres specific:
update t1 set
active = 'n'
from
table t1
inner join (select max(id) as topId, parent_id from table group by parent_id) t2 on
t1.id < t2.topId
and t1.parent_id = t2.parent_id
The second one is probably a bit faster, since it's not doing a correlated subquery for each row. Enjoy!

Related

How to include and exclude ids in once query postgresql

I use PostgreSQL 13.3
I'm trying to think how I can make include/exclude in query at the same time
I have include_system_ids [1,5] and exclude_system_ids [3]
There's one big table - records
system_records table
record
system_id
1
1
1
5
1
3
2
1
2
5
If a record contains an exclusive identifier, then it should not be included in the final selection. I had some several tries, but I didn't get a necessary result
Awaiting result: record with id 2
Fact result: 1, 2
My variants
select r.id from records r
left join (select record_id from system_records
where system_id in (1,5)
) include_ids on r.id = include_ids
left join (select record_id from system_records
where system_id not in (3)
) exclude_ids on r.id = exclude_ids.id
Honestly, I don't understand how I can do it((
Is there anyone who can help me
Maybe this query could be a solution (result here)
with x as (select record,string_agg(system_id::varchar,',') as sys_id from records group by record)
select records.*
from records,x
where records.record = x.record
and x.sys_id = '1,5'

PSQL filter each group of rows

Recently I've faced with pretty rare filtering case in PSQL.
My question is: How to filter redundant elements in each group of the grouped table?
For example: we have a nexp table:
id | group_idx | filter_idx
1 1 x
2 3 z
3 3 x
4 2 x
5 1 x
6 3 x
7 2 x
8 1 z
9 2 z
Firstly, to group rows:
SELECT group_idx FROM table
GROUP BY group_idx;
But how I can filter redundant fields (filter_idx = z) from each group after grouping?
P.S. I can't just write like that because I need to find groups firstly.
SELECT group_idx FROM table
where filter_idx <> z;
Thanks.
Assuming that you want to see all groups at all times, even when you filter out all records of some group:
drop table if exists test cascade;
create table test (id integer, group_idx integer, filter_idx character);
insert into test
(id,group_idx,filter_idx)
values
(1,1,'x'),
(2,3,'z'),
(3,3,'x'),
(4,2,'x'),
(5,1,'x'),
(6,3,'x'),
(7,2,'x'),
(8,1,'z'),
(9,2,'z'),
(0,4,'y');--added an example of a group that would be discarded using WHERE.
Get groups in one query, filter your rows in another, then left join the two.
select groups.group_idx,
string_agg(filtered_rows.filter_idx,',')
from
(select distinct group_idx from test) groups
left join
(select group_idx,filter_idx from test where filter_idx<>'y') filtered_rows
using (group_idx)
group by 1;
-- group_idx | string_agg
-------------+------------
-- 3 | z,x,x
-- 4 |
-- 2 | x,x,z
-- 1 | x,x,z
--(4 rows)

Get the ID of a table and its modulo respect the total rows in the same table in Postgres

While trying to map some data to a table, I wanted to obtain the ID of a table and its modulo respect the total rows in the same table. For example, given this table:
id
--
1
3
10
12
I would like this result:
id | mod
---+----
1 | 1 <- 1 mod 4
3 | 3 <- 3 mod 4
10 | 2 <- 10 mod 4
12 | 0 <- 12 mod 4
Is there an easy way to achieve this dynamically (as in, not counting the rows on before hand or doing it in an atomic way)?
So far I've tried something like this:
SELECT t1.id, t1.id % COUNT(t1.id) mod FROM tbl t1, tbl t2 GROUP BY t1.id;
This works but you must have the GROUP BY and tbl t2 as otherwise it returns 0 for the mod column which makes sense because I think it works by multiplying the table by itself so each ID gets a full set of the table. I guess for small enough tables this is ok but I can see how this becomes problematic for larger tables.
Edit: Found another hack-ish way:
WITH total AS (
SELECT COUNT(*) cnt FROM tbl
)
SELECT t1.id, t1.id % t2.cnt mod FROM tbl t1, total t2
It similar to the previous query but it "collapses" the multiplication to a single row with the previous count.
You can use COUNT() window function:
SELECT id,
id % COUNT(*) OVER () mod
FROM tbl;
I'm sure that the optimizer is smart enough to calculate the result of the window function only once.
See the demo.

Subsetting records that contain multiple values in one column

In my postgres table, I have two columns of interest: id and name - my goal is to only keep records where id has more than one value in name. In other words, would like to keep all records of ids that have multiple values and where at least one of those values is B
UPDATE: I have tried adding WHERE EXISTS to the queries below but this does not work
The sample data would look like this:
> test
id name
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 2 B
9 1 B
10 2 B
and the output would look like this:
> output
id name
1 1 A
2 2 A
8 2 B
9 1 B
10 2 B
How would one write a query to select only these kinds records?
Based on your description you would seem to want:
select id, name
from (select t.*, min(name) over (partition by id) as min_name,
max(name) over (partition by id) as max_name
from t
) t
where min_name < max_name;
This can be done using EXISTS:
select id, name
from test t1
where exists (select *
from test t2
where t1.id = t2.id
and t1.name <> t2.name) -- this will select those with multiple names for the id
and exists (select *
from test t3
where t1.id = t3.id
and t3.name = 'B') -- this will select those with at least one b for that id
Those records where for their id more than one name shines up, right?
This could be formulated in "SQL" as follows:
select * from table t1
where id in (
select id
from table t2
group by id
having count(name) > 1)

PostgreSQL: set a column with the ordinal of the row sorted via another field

I have a table segnature describing an item with a varchar field deno and a numeric field ord. A foreign key fk_collection tells which collection the row is part of.
I want to update field ord so that it contains the ordinal of that row per each collection, sorted by field deno.
E.g. if I have something like
[deno] ord [fk_collection]
abc 10
aab 10
bcd 10
zxc 20
vbn 20
Then I want a result like
[deno] ord [fk_collection]
abc 1 10
aab 0 10
bcd 2 10
zxc 1 20
vbn 0 20
I tried with something like
update segnature s1 set ord = (select count(*)
from segnature s2
where s1.fk_collection=s2.fk_collection and s2.deno<s1.deno
)
but query is really slow: 150 collections per about 30000 items are updated in 10 minutes about.
Any suggestion to speed up the process?
Thank you!
You can use a window function to generate the "ordinal" number:
with numbered as (
select deno, fk_collection,
row_number() over (partition by fk_collection order by deno) as rn,
ctid as id
from segnature
)
update segnature
set ord = n.rn
from numbered n
where n.id = segnature.ctid;
This uses the internal column ctid to uniquely identify each rows. The ctid comparison is quite slow, so if you have a real primary (or unique) key in that table, use that column instead.
Alternatively without the common table expression:
update segnature
set ord = n.rn
from (
select deno, fk_collection,
row_number() over (partition by fk_collection order by deno) as rn,
ctid as id
from segnature
) as n
where n.id = segnature.ctid;
SQLFiddle example: http://sqlfiddle.com/#!15/e997f/1