I have a PL/pgSQL function (triggering when I UPDATE a row) that includes this statement:
IF NEW IS NOT DISTINCT FROM OLD THEN
RETURN OLD;
END IF;
Is there any way I can exclude a particular column from the IS DISTINCT FROM comparison?
Related
I have checked similar threats but not match exactly that I am trying to do.
I have two identical tables:
t_data: we will have the data of the last two months. There will be a job for delete data older than two months.
t_data_historical: we will have all data.
I want to do: If an INSERT is done into t_data, I want to "clone" the same query to t_data_historical.
I have seen that can be done with triggers but there is something that I don't understand: in the trigger definition I need to know the columns and values and I don't want to care about the INSERT query (I trust of the client that to do the query), I only clone it.
If you know that the history table will have the same definition, and the name is always constructed that way, you can
EXECUTE format('INSERT INTO %I.%I VALUES (($1).*)',
TG_TABLE_SCHEMA,
TG_TABLE_NAME || '_historical')
USING NEW;
in your trigger function.
That will work for all tables, and you don't need to know the actual columns.
Laurenz,
I created the trigger like:
CREATE OR REPLACE FUNCTION copy2historical() RETURNS trigger AS
$BODY$
BEGIN
EXECUTE format('INSERT INTO %I.%I VALUES (($1).*)', TG_TABLE_SCHEMA, TG_TABLE_NAME || '_historical')
USING NEW;
END;
RETURN NULL;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
CREATE TRIGGER copy2historical_tg
AFTER INSERT
ON t_data
FOR EACH ROW
EXECUTE PROCEDURE copy2historical();
thanks for your help.
I'm trying to EXECUTE some SELECTs to use inside a function, my code is something like this:
DECLARE
result_one record;
BEGIN
EXECUTE 'WITH Q1 AS
(
SELECT id
FROM table_two
INNER JOINs, WHERE, etc, ORDER BY... DESC
)
SELECT Q1.id
FROM Q1
WHERE, ORDER BY...DESC';
RETURN final_result;
END;
I know how to do it in MySQL, but in PostgreSQL I'm failing. What should I change or how should I do it?
For a function to be able to return multiple rows it has to be declared as returns table() (or returns setof)
And to actually return a result from within a PL/pgSQL function you need to use return query (as documented in the manual)
To build dynamic SQL in Postgres it is highly recommended to use the format() function to properly deal with identifiers (and to make the source easier to read).
So you need something like:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
'with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc', p_sort_column);
end;
$$
language plpgsql;
Note that the order by inside the CTE is pretty much useless if you are sorting the final query unless you use a LIMIT or distinct on () inside the query.
You can make your life even easier if you use another level of dollar quoting for the dynamic SQL:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
$query$
with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc
$query$, p_sort_column);
end;
$$
language plpgsql;
What a_horse said. And:
How to return result of a SELECT inside a function in PostgreSQL?
Plus, to pick a column for ORDER BY dynamically, you have to add that column to the SELECT list of your CTE, which leads to complications if the column can be duplicated (like with passing 'id') ...
Better yet, remove the CTE entirely. There is nothing in your question to warrant its use anyway. (Only use CTEs when needed in Postgres, they are typically slower than equivalent subqueries or simple queries.)
CREATE OR REPLACE FUNCTION get_data(p_sort_column text)
RETURNS TABLE (id integer) AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$
SELECT t2.id -- assuming you meant t2?
FROM table_two t2
JOIN table_three t3 on ...
ORDER BY t2.%I DESC NULL LAST -- see below!
$q$, $1);
END
$func$ LANGUAGE plpgsql;
I appended NULLS LAST - you'll probably want that, too:
PostgreSQL sort by datetime asc, null first?
If p_sort_column is from the same table all the time, hard-code that table name / alias in the ORDER BY clause. Else, pass the table name / alias separately and auto-quote them separately to be safe:
Define table and column names as arguments in a plpgsql function?
I suggest to table-qualify all column names in a bigger query with multiple joins (t2.id not just id). Avoids various kinds of surprising results / confusion / abuse.
And you may want to schema-qualify your table names (myschema.table_two) to avoid similar troubles when calling the function with a different search_path:
How does the search_path influence identifier resolution and the "current schema"
I need to check if a value inside a jsonb is already present in a array. I'm trying to achieve this with a trigger but i'm new to this language and I don't know how to write the query.
CREATE TABLE merchants (
key uuid PRIMARY KEY,
data jsonb NOT NULL
)
Here is the trigger. I think the NEW.data.ids part is wrong.
CREATE FUNCTION validate_id_constraint() returns trigger as $$
DECLARE merchants_count int;
BEGIN
merchants_count := (SELECT count(*) FROM merchants WHERE data->'ids' #> NEW.data.ids);
IF (merchants_count != 0) THEN
RAISE EXCEPTION 'Duplicate id';
END IF;
RETURN NEW;
END;
$$ language plpgsql;
CREATE TRIGGER validate_id_constraint_trigger BEFORE INSERT OR UPDATE ON merchants
FOR EACH ROW EXECUTE PROCEDURE validate_id_constraint();
When i insert into the table i get this error message
ERROR: missing FROM-clause entry for table "data"
LINE 1: ...LECT count(*) FROM merchants WHERE data->'ids' #> NEW.data.i...
^
I have done the query outside the trigger and it works fine
SELECT count(*) FROM merchants WHERE data->'ids' #> '["11176", "11363"]'
You get that error because you are using . instead of -> to extract the ids array in the expression NEW.data.ids.
But your trigger won't work anyway because you are not trying to avoid containment, but overlaps in the arrays.
One way you could write the trigger function is:
CREATE OR REPLACE FUNCTION validate_id_constraint() RETURNS trigger
LANGUAGE plpgsql AS
$$DECLARE
j jsonb;
BEGIN
FOR j IN
SELECT jsonb_array_elements(NEW.data->'ids')
LOOP
IF EXISTS
(SELECT 1 FROM merchants WHERE j <# (data->'ids'))
THEN
RAISE EXCEPTION 'Duplicate IDs';
END IF;
END LOOP;
RETURN NEW;
END;$$;
You have to loop because there is no “overlaps” operator on jsonb arrays.
This is all slow and cumbersome because of your table design.
Note 1: You would be much better off if you only store data in jsonb that you do not need to manipulate in the database. In particular, you should store the ids array as field in the table. Then you can use the “overlaps” operator && and speed that up with a gin index.
You would be even faster if you normalized the table structure and stored the individual array entries in a separate table, then a regular unique constraint would do.
Note 2: Any constraint enabled by a trigger suffers from a race condition: if two concurrent INSERTs conflict with each other, the trigger function will not see the values from the concurrent INSERT and you may end up with inconsistent data.
My idea is to implement a basic «vector clock», where a timestamps are clock-based, always go forward and are guaranteed to be unique.
For example, in a simple table:
CREATE TABLE IF NOT EXISTS timestamps (
last_modified TIMESTAMP UNIQUE
);
I use a trigger to set the timestamp value before insertion. It basically just goes into the future when two inserts arrive at the same time:
CREATE OR REPLACE FUNCTION bump_timestamp()
RETURNS trigger AS $$
DECLARE
previous TIMESTAMP;
current TIMESTAMP;
BEGIN
previous := NULL;
SELECT last_modified INTO previous
FROM timestamps
ORDER BY last_modified DESC LIMIT 1;
current := clock_timestamp();
IF previous IS NOT NULL AND previous >= current THEN
current := previous + INTERVAL '1 milliseconds';
END IF;
NEW.last_modified := current;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS tgr_timestamps_last_modified ON timestamps;
CREATE TRIGGER tgr_timestamps_last_modified
BEFORE INSERT OR UPDATE ON timestamps
FOR EACH ROW EXECUTE PROCEDURE bump_timestamp();
I then run a massive amount of insertions in two separate clients:
DO
$$
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO timestamps DEFAULT VALUES;
END LOOP;
END;
$$;
As expected, I get collisions:
ERROR: duplicate key value violates unique constraint "timestamps_last_modified_key"
État SQL :23505
Détail :Key (last_modified)=(2016-01-15 18:35:22.550367) already exists.
Contexte : SQL statement "INSERT INTO timestamps DEFAULT VALUES"
PL/pgSQL function inline_code_block line 4 at SQL statement
#rach suggested to mix current_clock() with a SEQUENCE object, but it would probably imply getting rid of the TIMESTAMP type. Even though I can't really figure out how it'd solve the isolation problem...
Is there a common pattern to avoid this?
Thank you for your insights :)
If you have only one Postgres server as you said, I think that using timestamp + sequence can solve the problem because sequence are non transactional and respect the insert order.
If you have db shard then it will be much more complex but maybe the distributed sequence of 2ndquadrant in BDR could help but I don't think that ordinality will be respected. I added some code below if you have setup to test it.
CREATE SEQUENCE "timestamps_seq";
-- Let's test first, how to generate id.
SELECT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0') as unique_id ;
unique_id
--------------------------------
145288519200000000000000000010
(1 row)
CREATE TABLE IF NOT EXISTS timestamps (
unique_id TEXT UNIQUE NOT NULL DEFAULT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0')
);
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
select * from timestamps;
unique_id
--------------------------------
145288556900000000000000000001
145288557000000000000000000002
145288557100000000000000000003
(3 rows)
Let me know if that works. I'm not a DBA so maybe it will be good to ask on dba.stackexchange.com too about the potential side effect.
My two cents (Inspired from http://tapoueh.org/blog/2013/03/15-batch-update).
try adding the following before massive amount of insertions:
LOCK TABLE timestamps IN SHARE MODE;
Official documentation is here: http://www.postgresql.org/docs/current/static/sql-lock.html
I have a table called assignments. I would like to be able to read/write to all the columns in this table using either assignments.column or homework.column, how can I do this?
I know this is not something you would normally do. I need to be able to do this to provide backwards compatibility for a short period of time.
We have an iOS app that currently does direct postgresql queries against the DB. We're updating all of our apps to use an API. In the process of building the API the developer decided to change the name of the tables because we (foolishly) thought we didn't need backwards compatibility.
Now, V1.0 and the API both need to be able to write to this table so I don't have to do some voodoo later to transfer/combine data later...
We're using Ruby on Rails for the API.
With Postgres 9.3 the following should be enough:
CREATE VIEW homework AS SELECT * FROM assignments;
It works because simple views are automatically updatable (see docs).
In Postgres 9.3 or later, a simple VIEW is "updatable" automatically. The manual:
Simple views are automatically updatable: the system will allow
INSERT, UPDATE and DELETE statements to be used on the view in
the same way as on a regular table. A view is automatically updatable
if it satisfies all of the following conditions:
The view must have exactly one entry in its FROM list, which must be a table or another updatable view.
The view definition must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses at the top level.
The view definition must not contain set operations (UNION, INTERSECT or EXCEPT) at the top level.
The view's select list must not contain any aggregates, window functions or set-returning functions.
If one of these conditions is not met (or for the now outdated Postgres 9.2 or older), a manual setup may do the job.
Building on your work in progress:
Trigger function
CREATE OR REPLACE FUNCTION trg_ia_insupdel()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl CONSTANT regclass := 'iassignments_assignments';
_cols text;
_vals text;
BEGIN
CASE TG_OP
WHEN 'INSERT' THEN
INSERT INTO iassignments_assignments
VALUES (NEW.*);
RETURN NEW;
WHEN 'UPDATE' THEN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ') -- incl. pk col!
, string_agg('n.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl -- _tbl converted to oid automatically
AND attnum > 0 -- no system columns
AND NOT attisdropped; -- no dropped (dead) columns
EXECUTE format('
UPDATE %s t
SET (%s) = (%s)
FROM (SELECT ($1).*) n
WHERE t.published_assignment_id
= ($2).published_assignment_id' -- match to OLD value of pk
, _tbl, _cols, _vals) -- _tbl converted to text automatically
USING NEW, OLD;
RETURN NEW;
WHEN 'DELETE' THEN
DELETE FROM iassignments_assignments
WHERE published_assignment_id = OLD.published_assignment_id;
RETURN OLD;
END CASE;
RETURN NULL; -- control should never reach this
END
$func$;
Trigger
CREATE TRIGGER insupbef
INSTEAD OF INSERT OR UPDATE OR DELETE ON assignments_published
FOR EACH ROW EXECUTE PROCEDURE trg_ia_insupdel();
Notes
assignments_published must be a VIEW, an INSTEAD OF trigger is only allowed for views.
Dynamic SQL (in the UPDATE section) is not strictly necessary, only to cover future changes to the table layout automatically. The names of table and PK are still hard coded.
Simpler and probably cheaper without sub-block (like you had).
Using (SELECT ($1).*) instead of the shorter VALUES ($1.*) to preserve column names.
My naming convention: I prepend trg_ for trigger functions, followed by an abbreviation indicating the target table and finally one or more of the the tokens ins, up and del for INSERT, UPDATE and DELETE respectively. The name of the trigger is a copy of the function name, stripped of the first two parts. This is purely a matter of convention and taste but has proven useful for me since the names tell the purpose and are still short.
More explanation in the related answer that has already been mentioned:
Update multiple columns in a trigger function in plpgsql
This is where I am with the trigger functions so far, any feedback would be greatly appreciated. It's a combination of http://vibhorkumar.wordpress.com/2011/10/28/instead-of-trigger/ and Update multiple columns in a trigger function in plpgsql
Table: iassignments_assignments
Columns:
published_assignment_id
name
filepath
filename
link
teacher
due date
description
published
classrooms
View: assignments_published - SELECT * FROM iassignments_assignments
Trigger Function for assignments_published
CREATE OR REPLACE FUNCTION assignments_published_trigger_func()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $function$
BEGIN
IF TG_OP = 'INSERT' THEN
EXECUTE format('INSERT INTO %s SELECT ($1).*', 'iassignments_assignments')
USING NEW;
RETURN NEW;
ELSIF TG_OP = 'UPDATE' THEN
DECLARE
tbl = 'iassignments_assignments';
cols text;
vals text;
BEGIN
SELECT INTO cols, vals
string_agg(quote_ident(attname), ', ')
,string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s t
SET (%s) = (%s)
FROM (SELECT ($1).*) x
WHERE t.published_assignment_id = ($2).published_assignment_id'
, tbl, cols, vals)
USING NEW, OLD;
RETURN NEW;
END
ELSIF TG_OP = 'DELETE' THEN
DELETE FROM iassignments_assignments WHERE published_assignment_id=OLD.published_assignment_id;
RETURN NULL;
END IF;
RETURN NEW;
END;
$function$;
Trigger
CREATE TRIGGER assignments_published_trigger
INSTEAD OF INSERT OR UPDATE OR DELETE ON
assignments_published FOR EACH ROW EXECUTE PROCEDURE assignments_published_trigger_func();
Table: iassignments_classes
Columns:
class_assignment_id
guid
assignment_published_id
View: assignments_class - SELECT * FROM assignments_classes
Trigger Function for assignments_class
**I'll create this function once I have received feedback on the other and know it's create, so I (hopefully) need very little changes to this function.
Trigger
CREATE TRIGGER assignments_class_trigger
INSTEAD OF INSERT OR UPDATE OR DELETE ON
assignments_class FOR EACH ROW EXECUTE PROCEDURE assignments_class_trigger_func();