PSQL filter each group of rows - postgresql

Recently I've faced with pretty rare filtering case in PSQL.
My question is: How to filter redundant elements in each group of the grouped table?
For example: we have a nexp table:
id | group_idx | filter_idx
1 1 x
2 3 z
3 3 x
4 2 x
5 1 x
6 3 x
7 2 x
8 1 z
9 2 z
Firstly, to group rows:
SELECT group_idx FROM table
GROUP BY group_idx;
But how I can filter redundant fields (filter_idx = z) from each group after grouping?
P.S. I can't just write like that because I need to find groups firstly.
SELECT group_idx FROM table
where filter_idx <> z;
Thanks.

Assuming that you want to see all groups at all times, even when you filter out all records of some group:
drop table if exists test cascade;
create table test (id integer, group_idx integer, filter_idx character);
insert into test
(id,group_idx,filter_idx)
values
(1,1,'x'),
(2,3,'z'),
(3,3,'x'),
(4,2,'x'),
(5,1,'x'),
(6,3,'x'),
(7,2,'x'),
(8,1,'z'),
(9,2,'z'),
(0,4,'y');--added an example of a group that would be discarded using WHERE.
Get groups in one query, filter your rows in another, then left join the two.
select groups.group_idx,
string_agg(filtered_rows.filter_idx,',')
from
(select distinct group_idx from test) groups
left join
(select group_idx,filter_idx from test where filter_idx<>'y') filtered_rows
using (group_idx)
group by 1;
-- group_idx | string_agg
-------------+------------
-- 3 | z,x,x
-- 4 |
-- 2 | x,x,z
-- 1 | x,x,z
--(4 rows)

Related

Get the ID of a table and its modulo respect the total rows in the same table in Postgres

While trying to map some data to a table, I wanted to obtain the ID of a table and its modulo respect the total rows in the same table. For example, given this table:
id
--
1
3
10
12
I would like this result:
id | mod
---+----
1 | 1 <- 1 mod 4
3 | 3 <- 3 mod 4
10 | 2 <- 10 mod 4
12 | 0 <- 12 mod 4
Is there an easy way to achieve this dynamically (as in, not counting the rows on before hand or doing it in an atomic way)?
So far I've tried something like this:
SELECT t1.id, t1.id % COUNT(t1.id) mod FROM tbl t1, tbl t2 GROUP BY t1.id;
This works but you must have the GROUP BY and tbl t2 as otherwise it returns 0 for the mod column which makes sense because I think it works by multiplying the table by itself so each ID gets a full set of the table. I guess for small enough tables this is ok but I can see how this becomes problematic for larger tables.
Edit: Found another hack-ish way:
WITH total AS (
SELECT COUNT(*) cnt FROM tbl
)
SELECT t1.id, t1.id % t2.cnt mod FROM tbl t1, total t2
It similar to the previous query but it "collapses" the multiplication to a single row with the previous count.
You can use COUNT() window function:
SELECT id,
id % COUNT(*) OVER () mod
FROM tbl;
I'm sure that the optimizer is smart enough to calculate the result of the window function only once.
See the demo.

is there something similar to column_id in postgresql?

Just as in oracle database we have column_id in all_tabs_columns is there a similar field for postgresql?
For example if in oracle we can order by column id by selecting from columns do we have a similar query in pgsql ?
The column attnum in pg_attribute shows the order of a column in a table.
yes, you can also order by column number in Postgres:
with sampledata(a,b) as (values (1,'z'),(2,'y'),(3,'x'))
SELECT a,b
FROM sampledata
ORDER BY 1;
a | b
---+---
1 | z
2 | y
3 | x
(3 rows)
with sampledata(a,b) as (values (1,'z'),(2,'y'),(3,'x'))
SELECT a,b
FROM sampledata
ORDER BY 2;
a | b
---+---
3 | x
2 | y
1 | z
(3 rows)
The closest equivalent to all_tab_columns is information_schema.columns
So you are probably looking for something along the lines:
select column_name
from information_schema.columns
where table_name = '...'
order by ordinal_position;
In postgresql. Let's say you have a table with the name Table with the following columns:
Col 1,
Col 2,
Col 3
You can completely reorder the columns in postgresql by writing the query
SELECT
Col 3,
Col 2,
Col 1
FROM
Table

Postgres, get two row values that are both linked to the same ID

I have a rather tricky database problem that has really stumped me, would appreciate any help.
I have a table which includes data from multiple different sources. This data from different sources can be ‘duplicated’ and we have ways of identifying if that is the case.
Each row in the table has an ‘id’, and if it is identified as a duplicate of another row then we merge it, and it is given a ‘merged_into_id’ which refers to another row in the same table.
I am trying to run a report which will return information about where we have identified duplicates from two of those different sources.
Lets say I have three sources: A, B and C. I want to identify all of the duplicate rows between source A and source B.
I have got the query working fine to do this if a row from source A is directly merged into source B. However, we also have instances in the DB where source A row AND source B row are merged into source C. I am struggling with these and was hoping someone could help with that.
An example:
Original DB:
id
source
merged_into_id
1
A
3
2
B
3
3
C
NULL
What I would like to do is to be able to return id 1 and id 2 from that table, as they are both merged into the same ID e.g. like so:
source_a_id
source_b_id
1
2
But I'm really struggling to get to that - all I've managed to do is create a parent and child link like the following:
parent_id
child_id
child_source
3
1
A
3
2
B
I can also return just the IDs that I want, but they don't 'join' so to speak:
e.g.
SELECT
CASE WHEN child_source = 'A' then child_id as source_a_id,
CASE WHEN child_source = 'B' then child_id as source_b_id
But that just gives me a response with an empty row for the 'missing' data
---EDIT---
Using array_agg and array_to_string I've gotten a little closer to what I need:
SELECT
parent.id as parent_id,
ARRAY_TO_STRING(
ARRAY_AGG(CASE WHEN child_source = 'A' THEN child.id END)
, ','
) a_id,
ARRAY_TO_STRING(
ARRAY_AGG(CASE WHEN child_source = 'B' THEN child.id END)
, ','
) b_id
but its not quite the right format as I can occasionally have multiple versions from each source, so I get a table that looks like :
parent_id
a_id
b_id
3
1
2,4,5
In this case, I want to return a table that looks like:
parent_id
a_id
b_id
3
1
2
3
1
4
3
1
5
Does anyone have any advice on getting to my desired output? Many thanks
Suppose that we have this table
select * from t;
id | source | merged_into_id
----+--------+----------------
1 | A | 3
2 | B | 3
3 | C |
5 | B | 3
4 | B | 3
(5 rows)
This should do the work
WITH B_source as (select * from t where source = 'B'),
A_source as (select * from t where source = 'A')
SELECT merged_into_id,A_source.id as a_id,B_source.id as b_id
FROM A_source
INNER JOIN B_source using (merged_into_id);
Result
merged_into_id | a_id | b_id
----------------+------+------
3 | 1 | 2
3 | 1 | 5
3 | 1 | 4
(3 rows)

SQL Query for equal and opposite values

Suppose I have a table with two columns id and val. I wan't to find all the distinct ids where there exist a pair of equal and opposite vals. For example suppose you have the following table
id | val
------+------
1 | 3
2 | 5
2 | -5
1 | 4
2 | 6
3 | 9
2 | -6
3 | -9
I want the result to be
result
2
3
2 in the result set because there are values 5, -5 and 6, -6. 3 is in the result set because of 9, -9.
I can do this by using where exists. Something like
select distinct tab1.id from tab tab1
where exists (
select * from tab tab2
where tab1.id = tab2.id
and tab1.val = -tab2.val
);
However I worry that a query like this has time complexity O(n^2) because it is computed like nested loops (?). However it is possible to compute this in O(n) time by scanning the table (and keeping track of previously seen results in a data structure with O(1) lookup time). What is the optimal way to write such a query?
We should have an explain of your request and how you sets indexes.
May be it could be done like this too :
WITH pos AS (
SELECT id, val FROM tab WHERE val > 0),
neg AS (
SELECT id, val FROM tab WHERE val < 0)
SELECT DISTINCT id
FROM pos JOIN neg USING (id)
WHERE pos.val = neg.val;
With right indexation, this could be quick. Depend also of the volume of data.

Does the returning clause always execute first?

I have a many-to-many relation representing containers holding items.
I have a primary key row_id in the table.
I insert four rows: (container_id, item_id) values (1778712425160346751, 4). These rows will be identical except the aforementioned unique row_id.
I subsequently execute the following query:
delete from contains
where item_id = 4 and
container_id = '1778712425160346751' and
row_id =
(
select max(row_id) from contains
where container_id = '1778712425160346751' and
item_id = 4
)
returning
(
select count(*) from contains
where container_id = '1778712425160346751' and
item_id = 4
);
Now I expected to get 3 returned from this query, but I got a 4. Getting a 4 is the desired behavior, but it is not what was expected.
My question is: can I always expect that the returning clause executes before the delete, or is this an idiosyncrasy of certain versions or specific software?
The use of a query in returning section is allowed but not documented. For the documentation:
output_expression
An expression to be computed and returned by the DELETE command after each row is deleted. The expression can use any column names of the table named by table_name or table(s) listed in USING. Write * to return all columns.
It seems logical that the query sees the table in a state before deleting, as the statement is not completed yet.
create temp table test as
select id from generate_series(1, 4) id;
delete from test
returning id, (select count(*) from test);
id | count
----+-------
1 | 4
2 | 4
3 | 4
4 | 4
(4 rows)
The same concerns update:
create temp table test as
select id from generate_series(1, 4) id;
update test
set id = id+ 1
returning id, (select sum(id) from test);
id | sum
----+-----
2 | 10
3 | 10
4 | 10
5 | 10
(4 rows)