PostgreSQL trigger to avoid duplicate values in jsonb

PostgreSQL trigger to avoid duplicate values in jsonb - postgresql

I need to check if a value inside a jsonb is already present in a array. I'm trying to achieve this with a trigger but i'm new to this language and I don't know how to write the query.
CREATE TABLE merchants (
key uuid PRIMARY KEY,
data jsonb NOT NULL
)
Here is the trigger. I think the NEW.data.ids part is wrong.
CREATE FUNCTION validate_id_constraint() returns trigger as $$
DECLARE merchants_count int;
BEGIN
merchants_count := (SELECT count(*) FROM merchants WHERE data->'ids' #> NEW.data.ids);
IF (merchants_count != 0) THEN
RAISE EXCEPTION 'Duplicate id';
END IF;
RETURN NEW;
END;
$$ language plpgsql;
CREATE TRIGGER validate_id_constraint_trigger BEFORE INSERT OR UPDATE ON merchants
FOR EACH ROW EXECUTE PROCEDURE validate_id_constraint();
When i insert into the table i get this error message
ERROR: missing FROM-clause entry for table "data"
LINE 1: ...LECT count(*) FROM merchants WHERE data->'ids' #> NEW.data.i...
^
I have done the query outside the trigger and it works fine
SELECT count(*) FROM merchants WHERE data->'ids' #> '["11176", "11363"]'

You get that error because you are using . instead of -> to extract the ids array in the expression NEW.data.ids.
But your trigger won't work anyway because you are not trying to avoid containment, but overlaps in the arrays.
One way you could write the trigger function is:
CREATE OR REPLACE FUNCTION validate_id_constraint() RETURNS trigger
LANGUAGE plpgsql AS
$$DECLARE
j jsonb;
BEGIN
FOR j IN
SELECT jsonb_array_elements(NEW.data->'ids')
LOOP
IF EXISTS
(SELECT 1 FROM merchants WHERE j <# (data->'ids'))
THEN
RAISE EXCEPTION 'Duplicate IDs';
END IF;
END LOOP;
RETURN NEW;
END;$$;
You have to loop because there is no “overlaps” operator on jsonb arrays.
This is all slow and cumbersome because of your table design.
Note 1: You would be much better off if you only store data in jsonb that you do not need to manipulate in the database. In particular, you should store the ids array as field in the table. Then you can use the “overlaps” operator && and speed that up with a gin index.
You would be even faster if you normalized the table structure and stored the individual array entries in a separate table, then a regular unique constraint would do.
Note 2: Any constraint enabled by a trigger suffers from a race condition: if two concurrent INSERTs conflict with each other, the trigger function will not see the values from the concurrent INSERT and you may end up with inconsistent data.

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(

You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.

As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement

Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

How to make a PostgreSQL constraint only apply to a new value

I'm new to PostgreSQL and really loving how constraints work with row level security, but I'm confused how to make them do what I want them to.
I have a column and I want add a constraint that creates a minimum length for a text column, this check works for that:
(length((column_name):: text) > 6)
BUT, it also then prevents users updating any rows where column_name is already under 6 characters.
I want to make it so they can't change that value TO that, but can still update a row where that is already happening, so they can change it as needed according to my new policy.
Is this possible?

BUT, it also then prevents users updating any rows where column_name is already under 6 characters.
Well, no. When you try to add that CHECK constraint, all existing rows are checked, and an exception is raised if any violation is found.
You would have to make it NOT VALID. Then yes.
You really need a trigger on INSERT or UPDATE that checks new values. Not as cheap and not as bullet-rpoof, but still pretty solid. Like:
CREATE OR REPLACE FUNCTION trg_col_min_len6()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF TG_OP = 'UPDATE'
AND OLD.column_name IS NOT DISTINCT FROM NEW.column_name THEN
-- do nothing
ELSE
RAISE EXCEPTION 'New value for column "note" must have at least 6 characters.';
END IF;
RETURN NEW;
END
$func$;
-- trigger
CREATE TRIGGER tbl1_column_name_min_len6
BEFORE INSERT OR UPDATE ON tbl
FOR EACH ROW
WHEN (length(NEW.column_name) < 7)
EXECUTE FUNCTION trg_col_min_len6();
db<>fiddle here
It should be most efficient to check in a WHEN condition to the trigger directly. Then the trigger function is only ever called for short values and can be super simple.
See:
Trigger with multiple WHEN conditions
Fire trigger on update of columnA or ColumnB or ColumnC

You can create separate triggers for Insert and Update letting each completely define when it should fired. If completely different logic is required for the DML action this technique allows writing dedicated trigger functions. In this case that is not required the trigger function reduces to raise exception .... See Demo
-- Single trigger function for both Insert and Delete
create or replace function trg_col_min_len6()
returns trigger
language plpgsql
as $$
begin
raise exception 'Cannot % val = ''%''. Must have at least 6 characters.'
, tg_op, new.val;
return null;
end;
$$;
-- trigger before insert
create trigger tbl_val_min_len6_bir
before insert
on tbl
for each row
when (length(new.val) < 6)
execute function trg_col_min_len6();
-- trugger before update
create trigger tbl_val_min_len6_bur
before update
on tbl
for each row
when ( length(new.val) < 6
and new.val is distinct from old.val
)
execute function trg_col_min_len6();

Trigger | how to delete row instead of update based on cell value

Postgresql 10/11.
I need to delete row instead of update in case if target cell value is null.
So I created this trigger function:
CREATE OR REPLACE FUNCTION delete_on_update_related_table() RETURNS trigger
AS $$
DECLARE
refColumnName text = TG_ARGV[0];
BEGIN
IF TG_NARGS <> 1 THEN
RAISE EXCEPTION 'Trigger function expects 1 parameters, but got %', TG_NARGS;
END IF;
EXECUTE 'DELETE FROM ' || TG_TABLE_NAME || ' WHERE $1 = ''$2'''
USING refColumnName, OLD.id;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
And a BEFORE UPDATE trigger:
CREATE TRIGGER proper_delete
BEFORE UPDATE OF def_id
ON public.definition_products
FOR EACH ROW
WHEN (NEW.def_id IS NULL)
EXECUTE PROCEDURE delete_on_update_related_table('def_id');
Table is simple:
id uuid primary key
def_id uuid not null
Test:
UPDATE definition_products SET
def_id = NULL
WHERE id = 'f47415e8-6b00-4c65-aeb8-cadc15ca5890';
-- rows affected 0
Documentation says:
Row-level triggers fired BEFORE can return null to signal the trigger
manager to skip the rest of the operation for this row (i.e.,
subsequent triggers are not fired, and the INSERT/UPDATE/DELETE does
not occur for this row).
Previously, I used a RULE instead of the trigger. But there is no way to use WHERE & RETURNING clause in same rule.
You need an unconditional ON UPDATE DO INSTEAD rule with a RETURNING clause
So, is there a way?

While Jeremy's answer is good, there is still room for improvement.
Problems
You need to be very accurate in the definition of the objective. Your statement:
I need to delete row instead of update in case if target cell value is null.
... does not imply that the column was changed to NULL in the UPDATE at hand. Might have been NULL before, like, before you implemented the trigger. So not:
BEFORE UPDATE OF def_id ON public.definition_products
But just:
BEFORE UPDATE ON public.definition_products
Of course, if the column is defined NOT NULL (as it probably should be), there is no effective difference - except for the noise and an additional point of failure. The manual:
A column-specific trigger (one defined using the UPDATE OFcolumn_name syntax) will fire when any of its columns are listed as targets in the UPDATE command's SET list. It is possible for a column's value to change even when the trigger is not fired, because changes made to the row's contents by BEFORE UPDATE triggers are not considered.
Also, nothing in your question indicates the need for dynamic SQL. (That would be the case if you wanted to reuse the same trigger function for multiple triggers on different tables. And even then it's often better to just create several distinct trigger functions for multiple reason: simpler, faster, less error-prone, easier to read & maintain, ...)
As for "error-prone": your original dynamic statement was just invalid:
EXECUTE 'DELETE FROM ' || TG_TABLE_NAME || ' WHERE $1 = ''$2'''
USING refColumnName, OLD.id;
Can't pass a column name as value (refColumnName).
Can't put single quotes around $2, which is passed as value and hence needs no quoting.
An unqualified, unquoted TG_TABLE_NAME can go terribly wrong, which is especially critical for a heavy-weight function that deletes rows.
Jeremy's version fixes most, but still features the unqualified TG_TABLE_NAME.
This would be good:
EXECUTE format('DELETE FROM %s WHERE %I = $1', TG_RELID::regclass, refColumnName) -- refColumnName still unquoted
USING OLD.id;
Or:
EXECUTE format('DELETE FROM %I.%I WHERE %I = $1', TG_TABLE_SCHEMA, TG_TABLE_NAME, refColumnName)
USING OLD.id;
Related:
Why does a PostgreSQL SELECT query return different results when a schema name is specified?
Table name as a PostgreSQL function parameter
Solution
Simpler trigger function:
CREATE OR REPLACE FUNCTION delete_on_update_related_table()
RETURNS trigger AS
$func$
BEGIN
DELETE FROM public.definition_products WHERE id = OLD.id; -- def_id?
RETURN NULL;
END
$func$ LANGUAGE plpgsql;
Simpler trigger:
CREATE TRIGGER proper_delete
BEFORE UPDATE ON public.definition_products
FOR EACH ROW
WHEN (NEW.def_id IS NULL) -- that's the defining condition!
EXECUTE PROCEDURE delete_on_update_related_table(); -- no parameter
You probably want to use OLD.id, not OLD.def_id. (The row to delete is best defined by it's PK, not by the column changed to NULL.) But that's not entirely clear.

This works for me, with a few small changes:
CREATE OR REPLACE FUNCTION delete_on_update_related_table() RETURNS trigger
AS $$
DECLARE
refColumnName text = quote_ident(TG_ARGV[0]);
BEGIN
IF TG_NARGS <> 1 THEN RAISE EXCEPTION 'Trigger function expects 1 parameters, but got %', TG_NARGS; END IF;
EXECUTE format('DELETE FROM %s WHERE %s = %s', quote_ident(TG_TABLE_NAME), refColumnName, quote_literal(OLD.id));
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- create trigger
CREATE TRIGGER proper_delete
BEFORE UPDATE OF def_id
ON public.definition_products
FOR EACH ROW
WHEN (NEW.def_id IS NULL)
EXECUTE PROCEDURE delete_on_update_related_table('id'); --Note id, not def_id

PL/pgSQL trigger table entry limit

I'd like to get an opinion on a trigger I've written for a PostGreSQL Database in PL/pgSQL. I haven't done it previously and would like to get suggestions by more experienced users.
Task is simple enough:
Reduce the number of entries in a table to a set amount.
What should happen:
An INSERT into to the table device_position occurs,
If the amount of entries with a specific column (deviceid) value exceeds 50 delete the oldest.
Repeat
Please let me know if you see any obvious flaws:
CREATE OR REPLACE FUNCTION trim_device_positions() RETURNS trigger AS $trim_device_positions$
DECLARE
devicePositionCount int;
maxDevicePos CONSTANT int=50;
aDeviceId device_position.id%TYPE;
BEGIN
SELECT count(*) INTO devicePositionCount FROM device_position WHERE device_position.deviceid=NEW.deviceid;
IF devicePositionCount>maxDevicePos THEN
FOR aDeviceId IN SELECT id FROM device_position WHERE device_position.deviceid=NEW.deviceid ORDER BY device_position.id ASC LIMIT devicePositionCount-maxDevicePos LOOP
DELETE FROM device_position WHERE device_position.id=aDeviceId;
END LOOP;
END IF;
RETURN NULL;
END;
$trim_device_positions$ LANGUAGE plpgsql;
DROP TRIGGER trim_device_positions_trigger ON device_position;
CREATE TRIGGER trim_device_positions_trigger AFTER INSERT ON device_position FOR EACH ROW EXECUTE PROCEDURE trim_device_positions();
Thanks for any wisdom coming my way :)

PostgreSQL trigger raises error 55000

after migrating from PostgreSQL server version 9 to 8.4 I have encountered very strange error.
Short description:
If there is a trigger on a given table for each row before insert or update and one uses in conditional statement (if-else) TG_OP value check and OLD object, following error raises when doinng INSERT:
ERROR: record "old" is not assigned yet
DETAIL: The tuple structure of a not-yet-assigned record is indeterminate.
Detailed description:
There is following DB structure:
CREATE TABLE table1
(
id serial NOT NULL,
name character varying(256),
CONSTRAINT table1_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
CREATE OR REPLACE FUNCTION exemplary_function()
RETURNS trigger AS
$BODY$ BEGIN
IF TG_OP = 'INSERT' OR OLD.name <> NEW.name THEN
NEW.name = 'someName';
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
CREATE TRIGGER trigger1
BEFORE INSERT OR UPDATE
ON table1
FOR EACH ROW EXECUTE PROCEDURE exemplary_function();
and following SQL query that triggers error:
INSERT INTO table1 (name) VALUES ('other name')
It seems like parser is not stopping on TG_OP = 'INSERT' condition (and it should, because it is true) but checks another one and that triggers an error.
What's interesting, I was only able to reproduce it on version 8.4.

Postgres doesn't officially do short cuts on boolean statements (Unlike C for example)
It does say it that sometimes it can decide to short cut (see docs) but it might just easily decide to short cut on the second expression rather than the first.
It basically looks at how complicated the expressions on each side are before deciding the evaluation order. Then if that is TRUE it can decide not to bother with the other side.
In this case, it looks like its trying to interpret OLD while its still trying to decide the best order in which to evaluate the expression.
You should be able get around this by using a CASE to split the expressions eg.
IF (CASE WHEN TG_OP = 'INSERT' THEN TRUE ELSE OLD.name <> NEW.name END) THEN