Postgres merge two rows with common array elements - postgresql

I have a postgres table with a column names "ids".
+----+--------------+
| id | ids |
+----+--------------+
| 1 | {1, 2, 3} |
| 2 | {2, 7, 10} |
| 3 | {14, 11, 1} |
| 4 | {12, 13} |
| 5 | {15, 16, 12} |
+----+--------------+
I want to merge rows with at least one common array element and create a new row from that (or merge into one existing row). So finally the table would look like the following:
+----+--------------------------+
| id | ids |
+----+--------------------------+
| 6 | {1, 2, 3, 7, 10, 14, 11} |
| 7 | {12, 13, 15, 16} |
+----+--------------------------+
Order of array elements in the resulting table does not really matter but they must be unique.
The rows are added independently from another system. For example we could add a new row where ids are {16, 18, 1}.
Right now to make sure we combine all the rows with at least one common array element, I am doing the calculations in my server (Node.js).
So before I create a new row, I pull all the existing rows in the database that have at least one item in common using:
await t.any('SELECT * FROM arraytable WHERE $1 && ids', [16, 18, 1])
This gives me all the rows that have at least 16, 18 or 1. Then I merge the rows with [16, 18, 1] and remove duplicates.
With the availability of this new array, I delete all existing rows fetched above and insert this new row to the database. As you can see, most of the work is being done in Node.js.
Instead of this I am trying to create a trigger, which will do all these steps for me as soon as I add the new row. How do I go about doing this with a trigger. Also, are there better ways?

Can procedure suffice?
CREATE OR REPLACE PROCEDURE add_ids(new_ids INT[])
AS $$
DECLARE sum_array INT[];
BEGIN
SELECT ARRAY (SELECT UNNEST(ids) FROM table1 WHERE table1.ids && new_ids) INTO sum_array;
sum_array := sum_array || new_ids;
SELECT ARRAY(SELECT DISTINCT UNNEST(sum_array)) INTO sum_array;
DELETE FROM table1 WHERE table1.ids && sum_array;
INSERT INTO table1(ids) SELECT sum_array;
END;
$$
LANGUAGE plpgsql;
Unfortunately inserting row inside trigger calls another trigger causing infinitie loop. I do not know work around that.
PS. Sorry if creating another answer is bad practice. I want to leave it for now for reference. I will delete it when the problem is resolved.
Edit by pewpewlasers:
To prevent the loop another table is probably needed. I have created a new temporary table2. New arrays can be added to this table. This table will have a trigger which does the calculations and saves it to table1. It also deletes this temporarily created row.
CREATE OR REPLACE FUNCTION on_insert_temp() RETURNS TRIGGER AS $f$
DECLARE sum_array BIGINT[];
BEGIN
SELECT ARRAY (SELECT UNNEST(ids) FROM table1 WHERE table1.ids && NEW.ids) INTO sum_array;
sum_array := sum_array || NEW.ids;
SELECT ARRAY(SELECT DISTINCT UNNEST(sum_array)) INTO sum_array;
DELETE FROM table1 WHERE table1.ids && sum_array;
INSERT INTO table1(ids) SELECT sum_array;
DELETE FROM table2 WHERE id = NEW.id;
RETURN OLD;
END
$f$ LANGUAGE plpgsql;
CREATE TRIGGER on_insert_temp AFTER INSERT ON table2 FOR EACH ROW EXECUTE PROCEDURE on_insert_temp();

Given tables
CREATE TABLE table1(id serial, ids INT [] )
CREATE TABLE table2(id serial, ids INT [] )
the trigger can looks like that
CREATE OR REPLACE FUNCTION sum_tables_trigger() RETURNS TRIGGER AS $table1$
BEGIN
INSERT INTO table2(ids) SELECT ARRAY(SELECT DISTINCT UNNEST(table1.ids || new.ids) ORDER BY 1) FROM table1 WHERE table1.ids && new.ids;
RETURN NEW;
END;
$table1$ LANGUAGE plpgsql;
CREATE TRIGGER sum_tables_trigger_ BEFORE INSERT ON table1
FOR EACH ROW EXECUTE PROCEDURE sum_tables_trigger();
tableA.ids && tableB.ids returns true, if tables have common element.
tableA.ids || tableB.ids adds elements.
ARRAY(SELECT DISTINCT UNNEST(table1.ids || new.ids) ORDER BY 1) removes duplicates.

Related

Dynamically get columns names using old in triggers

I want to write a generic trigger function(Postgres Procedure). There are many main tables like TableA, TableB, etc. and their corresponding audit tables TableA_Audit, TableB_Audit, respectively. The structure is given below.
TableA and TableA_Audit has columns aa integer, ab integer.
TableB and TableB_Audit has columns ba integer.
Similarly there can be many main tables along with their audit tables.
The requirement is that is any of the main table gets updated then their entries should be inserted in their respective audit Table.
eg:- If TableA has entries like this
---------------------
| **TableA** |
---------------------
|aa | ab |
|--------|----------|
| 5 | 10 |
---------------------
and then i write an update like
update TableA set aa= aa+15,
then the old values for TableA should be inserted in the TableA_Audit Table like below
TableA_audit contains:-
---------------------
| **TableA_Audit** |
---------------------
|aa | ab |
|--------|----------|
| 5 | 10 |
---------------------
To faciliate the above scenario i have written a generic function called insert_in_audit. Whenever there is any update in any of the main table, the function insert_in_audit should be called. The function should achive the following:-
Dynamically insert entries in corresponding audit_table using main table. If there is update in Table B then entries should be inserted only in TableB_Audit.
Till now what i am able to do so. I have got the names of all the columns of the main table where update happened.
eg: for the query - update TableA set aa= aa+15, i am able to get all the columns name in TableA in a varchar array.
column_names varchar[ ] := '{"aa", "ab"}';
My question is that how to get old values of column aa and ab. I tried doing like this
foreach i in array column_names
loop
raise notice '%', old.i;
end loop;
But the above gave me error :- record "old" has no field "i". Can anyone help me to get old values.
Here is a code sample how you can dynamically extract values from OLD in PL/pgSQL:
CREATE FUNCTION dynamic_col() RETURNS trigger
LANGUAGE plpgsql AS
$$DECLARE
v_col name;
v_val text;
BEGIN
FOREACH v_col IN ARRAY TG_ARGV
LOOP
EXECUTE format('SELECT (($1).%I)::text', v_col)
USING OLD
INTO v_val;
RAISE NOTICE 'OLD.% = %', v_col, v_val;
END LOOP;
RETURN OLD;
END;$$;
CREATE TABLE trigtest (
id integer,
val text
);
INSERT INTO trigtest VALUES
(1, 'one'), (2, 'two');
CREATE TRIGGER dynamic_col AFTER DELETE ON trigtest
FOR EACH ROW EXECUTE FUNCTION dynamic_col('id', 'val');
DELETE FROM trigtest WHERE id = 1;
NOTICE: OLD.id = 1
NOTICE: OLD.val = one

Generating incremental numbers based on a different column

I have got a composite primary key in a table in PostgreSQL (I am using pgAdmin4)
Let's call the the two primary keys productno and version.
version represents the version of productno.
So if I create a new dataset, then it needs to be checked if a dataset with this productno already exists.
If productno doesn't exist yet, then version should be (version) 1
If productno exists once, then version should be 2
If productno exists twice, then version should be 3
... and so on
So that we get something like:
productno | version
-----|-----------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
I found a quite similar problem: auto increment on composite primary key
But I can't use this solution because PostgreSQL syntax is obviously a bit different - so tried a lot around with functions and triggers but couldn't figure out the right way to do it.
You can keep the version numbers in a separate table (one for each "base PK" value). That is way more efficient than doing a max() + 1 on every insert and has the additional benefit that it's safe for concurrent transactions.
So first we need a table that keeps track of the version numbers:
create table version_counter
(
product_no integer primary key,
version_nr integer not null
);
Then we create a function that increments the version for a given product_no and returns that new version number:
create function next_version(p_product_no int)
returns integer
as
$$
insert into version_counter (product_no, version_nr)
values (p_product_no, 1)
on conflict (product_no)
do update
set version_nr = version_counter.version_nr + 1
returning version_nr;
$$
language sql
volatile;
The trick here is the the insert on conflict which increments an existing value or inserts a new row if the passed product_no does not yet exists.
For the product table:
create table product
(
product_no integer not null,
version_nr integer not null,
created_at timestamp default clock_timestamp(),
primary key (product_no, version_nr)
);
then create a trigger:
create function increment_version()
returns trigger
as
$$
begin
new.version_nr := next_version(new.product_no);
return new;
end;
$$
language plpgsql;
create trigger base_table_insert_trigger
before insert on product
for each row
execute procedure increment_version();
This is safe for concurrent transactions because the row in version_counter will be locked for that product_no until the transaction inserting the row into the product table is committed - which will commit the change to the version_counter table as well (and free the lock on that row).
If two concurrent transactions insert the same value for product_no, one of them will wait until the other finishes.
If two concurrent transactions insert different values for product_no, they can work without having to wait for the other.
If we then insert these rows:
insert into product (product_no) values (1);
insert into product (product_no) values (2);
insert into product (product_no) values (3);
insert into product (product_no) values (1);
insert into product (product_no) values (3);
insert into product (product_no) values (2);
The product table looks like this:
select *
from product
order by product_no, version_nr;
product_no | version_nr | created_at
-----------+------------+------------------------
1 | 1 | 2019-08-23 10:50:57.880
1 | 2 | 2019-08-23 10:50:57.947
2 | 1 | 2019-08-23 10:50:57.899
2 | 2 | 2019-08-23 10:50:57.989
3 | 1 | 2019-08-23 10:50:57.926
3 | 2 | 2019-08-23 10:50:57.966
Online example: https://rextester.com/CULK95702
You can do it like this:
-- Check if pk exists
SELECT pk INTO temp_pk FROM table a WHERE a.pk = v_pk1;
-- If exists, inserts it
IF temp_pk IS NOT NULL THEN
INSERT INTO table(pk, versionpk) VALUES (v_pk1, temp_pk);
END IF;
So - I got it work now
So if you want a column to update depending on another column in pg sql - have a look at this:
This is the function I use:
CREATE FUNCTION public.testfunction()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE v_productno INTEGER := NEW.productno;
BEGIN
IF NOT EXISTS (SELECT *
FROM testtable
WHERE productno = v_productno)
THEN
NEW.version := 1;
ELSE
NEW.version := (SELECT MAX(testtable.version)+1
FROM testtable
WHERE testtable.productno = v_productno);
END IF;
RETURN NEW;
END;
$BODY$;
And this is the trigger that runs the function:
CREATE TRIGGER testtrigger
BEFORE INSERT
ON public.testtable
FOR EACH ROW
EXECUTE PROCEDURE public.testfunction();
Thank you #ChechoCZ, you definetly helped me getting in the right direction.

Return the number of rows with a where condition after an INSERT INTO? postgresql

I have a table that regroups some users and which event (as in IRL event) they've joined.
I have set up a server query that lets a user join an event.
It goes like this :
INSERT INTO participations
VALUES(:usr,:event_id)
I want that statement to also return the number of people who have joined the same event as the user. How do I proceed? If possible in one SQL statement.
Thanks
You can use a common table expression like this to execute it as one query.
with insert_tbl_statement as (
insert into tbl values (4, 1) returning event_id
)
select (count(*) + 1) as event_count from tbl where event_id = (select event_id from insert_tbl_statement);
see demo http://rextester.com/BUF16406
You can use a function, I've set up next example, but keep in mind you must add 1 to the final count because still transaction hasn't been committed.
create table tbl(id int, event_id int);
✓
insert into tbl values (1, 2),(2, 2),(3, 3);
3 rows affected
create function new_tbl(id int, event_id int)
returns bigint as $$
insert into tbl values ($1, $2);
select count(*) + 1 from tbl where event_id = $2;
$$ language sql;
✓
select new_tbl(4, 2);
| new_tbl |
| ------: |
| 4 |
db<>fiddle here

Postgres cascade delete on non-unique column

I have a table like this:
id | group_id | parent_group
---+----------+-------------
1 | 1 | null
2 | 1 | null
3 | 2 | 1
4 | 2 | 1
Is it possible to add a constraint such that a row is automatically deleted when there is no row with a group_id equal to the row's parent_group? For example, if I delete rows 1 and 2, I want rows 3 and 4 to be deleted automatically because there are no more rows with group_id 1.
The answer that clemens posted led me to the following solution. I'm not very familiar with triggers though; could there be any problems with this and is there a better way to do it?
CREATE OR REPLACE FUNCTION on_group_deleted() RETURNS TRIGGER AS $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM my_table WHERE group_id = OLD.group_id) THEN
DELETE FROM my_table WHERE parent_group = OLD.group_id;
END IF;
RETURN OLD;
END;
$$ LANGUAGE PLPGSQL;
CREATE TRIGGER my_table_delete_trigger AFTER DELETE ON my_table
FOR EACH ROW
EXECUTE PROCEDURE on_group_deleted();

Duplicate single database record

Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.
This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.
You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;